Sections prematurely terminated from repeated messages

jon-holman commented 4 months ago

I have a few circumstances where CELI is identifying repeated messages and terminating a section prematurely. Message I'm getting:

celi_framework.core.section_processor WARNING - Identified a loop in 1. Identical messages are repeating. pop_context and moving on to the next section.

For some context, my project is to draft a document, and for one of the sections of that document I want to update a section based on information from one new source document at a time. So in my job description, I have a short series of tasks I loop through until this process has been done for all source documents. I've included a counter of the total iterations required, current iteration, and remaining iterations to ensure progress is being made. The loop length is well-defined, so progress is relatively easy to track.

1) Sometimes I get a blank message return, after which CELI reorients itself and continues with the tasks as instructed. This isn't a problem itself, but if it happens 3 times across the entire section I'll get the mentioned message and the section will be terminated. This is by far the most common case where my loop gets broken. *One fix for this might be to look at the last n (I believe 3 based on the current setup) messages instead of through the entire message history

2) I have some tasks as part of the loop that are very simple, for example doing a simple check or calling a function. The tasks are simple enough that sometimes the output message is exactly the same. This can also trigger the loop identification logic and terminate the section. *A possibility to fix this may be to look for repeated consecutive sequences of messages. That could still capture unwanted loops while leaving loops with a counter or some other progress metric to run.

DaveDeCaprio commented 4 months ago

This makes sense. I just did a PR that will only check the recent messages for duplicates. That should address this. We’ll get a new version out hopefully this weekend with the fixes.

From: jon-holman @.> Sent: Friday, June 21, 2024 3:01 PM To: x3n0cr4735/celi @.> Cc: Subscribed @.***> Subject: [x3n0cr4735/celi] Sections prematurely terminated from repeated messages (Issue #74)

I have a few circumstances where CELI is identifying repeated messages and terminating a section prematurely. Message I'm getting:

celi_framework.core.section_processor WARNING - Identified a loop in 1. Identical messages are repeating. pop_context and moving on to the next section.

For some context, my project is to draft a document, and for one of the sections of that document I want to update a section based on information from one new source document at a time. So in my job description, I have a short series of tasks I loop through until this process has been done for all source documents. I've included a counter of the total iterations required, current iteration, and remaining iterations to ensure progress is being made. The loop length is well-defined, so progress is relatively easy to track.

Sometimes I get a blank message return, after which CELI reorients itself and continues with the tasks as instructed. This isn't a problem itself, but if it happens 3 times across the entire section I'll get the mentioned message and the section will be terminated. This is by far the most common case where my loop gets broken. *One fix for this might be to look at the last n (I believe 3 based on the current setup) messages instead of through the entire message history
I have some tasks as part of the loop that are very simple, for example doing a simple check or calling a function. The tasks are simple enough that sometimes the output message is exactly the same. This can also trigger the loop identification logic and terminate the section. *A possibility to fix this may be to look for repeated consecutive sequences of messages. That could still capture unwanted loops while leaving loops with a counter or some other progress metric to run.

— Reply to this email directly, view it on GitHub https://github.com/x3n0cr4735/celi/issues/74 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGNLOXNQFDCXDNGTICBMC3ZISA6FAVCNFSM6AAAAABJWVNC5GVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3DOMJWHE3DAMQ . You are receiving this because you are subscribed to this thread. https://github.com/notifications/beacon/AAGNLOQ7G2N5QVTD6VFHZSTZISA6FA5CNFSM6AAAAABJWVNC5GWGG33NNVSW45C7OR4XAZNFJFZXG5LFVJRW63LNMVXHIX3JMTHI2GBEII.gif Message ID: @. @.> >

jon-holman commented 4 months ago

Two pain points I'm still experiencing here:

CELI giving several blank messages before getting back on track
- This is by far the more common issue
- I see myself getting up to 3 blank messages in a row sometimes with no apparent downstream impact
- This triggers the current condition of 3 repeated messages in the last 6
- Even if I just get 1 or 2 blank messages at once, this might happen a few times in the last 6 messages with no true problems introduced in the larger process
- A potential solution could be to include an exception for blank messages in the loop detection that only counts them if consecutive and has a higher limit (I blew mine up to 10 just to be safe, but maybe 5 or 6 would be reasonable)
Repeated improper function calls
- CELI sometimes will call my functions with incorrect arguments and try a few times to correct its calls
- The retry limit I believe is currently 3
- These incorrect function calls are often not accompanied by any commentary from CELI apart from noting the error, so they are highly likely to be identical
- If the same function has this problem in some proximate tasks, they can get to the limit of 3 identical messages in a 6 message window
- This issue has become a lot less common as I've made my function calls clearer in my job description, but it still pops up from time to time
- A potential solution could be to exclude repeated messages where CELI is diagnosing a bad function call from being considered for loop detection (i.e. take the most recent 6 messages that aren't such a call) since we have separate logic for breaking us out of a loop of bad function calls

DaveDeCaprio commented 4 months ago

Hi Jon, I just pushed a duplicates-2 branch that improves the duplicate handling a bit. I raised the limit to 5 duplicates in a row and added a second check explicitly for alternating duplicates.

On the improper function calls, I think the explicit check for alternating duplicates should fix that as the function calls will have different parameters.

Let me know if those fixes work for you. Also, do you need a pip release for this or do you work directly off the source?

Dave

From: jon-holman @.> Sent: Tuesday, June 25, 2024 8:54 AM To: x3n0cr4735/celi @.> Cc: David DeCaprio @.>; State change @.> Subject: Re: [x3n0cr4735/celi] Sections prematurely terminated from repeated messages (Issue #74)

Two pain points I'm still experiencing here:

CELI giving several blank messages before getting back on track
This is by far the more common issue
I see myself getting up to 3 blank messages in a row sometimes with no apparent downstream impact
This triggers the current condition of 3 repeated messages in the last 6
Even if I just get 1 or 2 blank messages at once, this might happen a few times in the last 6 messages with no true problems introduced in the larger process
A potential solution could be to include an exception for blank messages in the loop detection that only counts them if consecutive and has a higher limit (I blew mine up to 10 just to be safe, but maybe 5 or 6 would be reasonable)
Repeated improper function calls
CELI sometimes will call my functions with incorrect arguments and try a few times to correct its calls
The retry limit I believe is currently 3
These incorrect function calls are often not accompanied by any commentary from CELI apart from noting the error, so they are highly likely to be identical
If the same function has this problem in some proximate tasks, they can get to the limit of 3 identical messages in a 6 message window
This issue has become a lot less common as I've made my function calls clearer in my job description, but it still pops up from time to time
A potential solution could be to exclude repeated messages where CELI is diagnosing a bad function call from being considered for loop detection (i.e. take the most recent 6 messages that aren't such a call) since we have separate logic for breaking us out of a loop of bad function calls

— Reply to this email directly, view it on GitHub https://github.com/x3n0cr4735/celi/issues/74#issuecomment-2189025539 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGNLORR5UPNMDJZS6JVW43ZJFY5VAVCNFSM6AAAAABJWVNC5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBZGAZDKNJTHE . You are receiving this because you modified the open/close state. https://github.com/notifications/beacon/AAGNLOTIWY6WW4D4BNQXXMDZJFY5VA5CNFSM6AAAAABJWVNC5GWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUCPHQQG.gif Message ID: @. @.> >

jon-holman commented 4 months ago

Great, thanks! I agree this should solve both issues. I'm working directly off of source so I'll work off of that branch for now.

Should the loop be checking previous_message instead of last_message to increment the alternating duplicates since last_message is being checked for regular duplicates? Right now I think we're just doing a similar count of duplicates of the most recent message rather than the second to last message.

So the sequence ['A', 'A', 'A', 'A', 'A', 'B', 'A'] will pass the first check (4 A's and 1 B in the 5 messages before the most recent message) but fail the second (4 A's in the 5 messages before the most recent message and 5 A's in the 5 messages before the 2nd most recent message).

def check_for_duplicates(
        self, ongoing_chat: List[Dict[str, str] | Tuple[str, str]]
    ):
        if len(ongoing_chat) == 0:
            return False
        last_message = ongoing_chat[-1]
        duplicates = 0
        for message in ongoing_chat[-6:-1]:
            if message == last_message:
                duplicates += 1
        if duplicates >= 5:
            return True
        # Check for cycles of 2 repeating messages
        if len(ongoing_chat) > 2:
            previous_message = ongoing_chat[-2]
            alternating_duplicates = 0
            if previous_message != last_message:
                for message in ongoing_chat[-7:-2]:
                    if message == **last_message**: #Should this be comparing to previous_message i.e. the 2nd most recent message?
                        alternating_duplicates += 1
                    if duplicates >= 3 and alternating_duplicates >= 3:
                        return True
        return False

jon-holman commented 4 months ago

As I look at it again, even with the change I suggested a true alternating sequence won't be handled properly. So consider ['A', 'B', 'A', 'B', 'A', 'B', 'A'].

Our initial check for duplicates will find only 2 more A's, so that will pass. But then if we look for 3 B's in [-7:-2] we will only find 2 and that will pass as well. (If we leave the code as it is we will still pass since duplicates will be equal to 2 although alternating_duplicates will be 3.)

I think the fix here could be to either:

Look through ongoing_chat[-7:-2] for 3 duplicates of last_message and ongoing_chat[-8:-2] for 3 duplicates of previous message.
Be explicit for an alternating sequence and look for last_message in -3, -5, and -7 and previous_message in -4, -6, and -8.

One other comment: a lot of my "repeated" messages are actually blank assistant messages while it debugs a function call. Whatever comes in that function call section isn't part of the duplicate check. I think that output can be informative for calling out that we're not in a true loop, so if we could merge the assistant and function outputs for the purposes of checking duplicates that could do a lot of the filtering for us. For example, here's something I get commonly:

Assistant:

Function: Call to function foo with arguments returned Error: Unable to parse arguments Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

Which might be followed by

Assistant:

Function: Call to function foo with arguments {'bar': '1'} returned baz

Still can be considered as 2 sequential duplicate messages although I would argue that's not intended.

DaveDeCaprio commented 4 months ago

The function messages will get included in the messages history that is getting checked.

This just needs some good unit tests. I’ll put those in and get it working correctly this morning.

Dave

From: jon-holman @.> Sent: Thursday, June 27, 2024 1:49 PM To: x3n0cr4735/celi @.> Cc: David DeCaprio @.>; State change @.> Subject: Re: [x3n0cr4735/celi] Sections prematurely terminated from repeated messages (Issue #74)

As I look at it again, even with the change I suggested a true alternating sequence won't be handled properly. So consider ['A', 'B', 'A', 'B', 'A', 'B', 'A'].

Our initial check for duplicates will find only 2 more A's, so that will pass. But then if we look for 3 B's in [-7:-2] we will only find 2 and that will pass as well. (If we leave the code as it is we will still pass since duplicates will be equal to 2 although alternating_duplicates will be 3.)

I think the fix here could be to either:

Look through ongoing_chat[-7:-2] for 3 duplicates of last_message and ongoing_chat[-8:-2] for 3 duplicates of previous message.
Be explicit for an alternating sequence and look for last_message in -3, -5, and -7 and previous_message in -4, -6, and -8.

One other comment: a lot of my "repeated" messages are actually blank assistant messages while it debugs a function call. Whatever comes in that function call section isn't part of the duplicate check. I think that output can be informative for calling out that we're not in a true loop, so if we could merge the assistant and function outputs for the purposes of checking duplicates that could do a lot of the filtering for us. For example, here's something I get commonly:

Assistant:

Function: Call to function foo with arguments returned Error: Unable to parse arguments Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

Which might be followed by

Assistant:

Function: Call to function foo with arguments {'bar': '1'} returned baz

Still can be considered as 2 sequential duplicate messages although I would argue that's not intended.

— Reply to this email directly, view it on GitHub https://github.com/x3n0cr4735/celi/issues/74#issuecomment-2195455350 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGNLOWZRMNP47XV77JMLT3ZJRNCFAVCNFSM6AAAAABJWVNC5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVGQ2TKMZVGA . You are receiving this because you modified the open/close state. https://github.com/notifications/beacon/AAGNLORYQTCDDX5JZBJM2HLZJRNCFA5CNFSM6AAAAABJWVNC5GWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUC3P6XM.gif Message ID: @. @.> >

DaveDeCaprio commented 4 months ago

I added some tests and updated the logic. I changed the alternate to look for 3 total copies (which is 2 duplicates). I think that should be sufficient. Let me know if it isn't.

x3n0cr4735 / celi

Sections prematurely terminated from repeated messages #74