mikeizbicki / cmc-csci181-languages

3 stars 5 forks source link

declaration.txt.attack not working #11

Open mikeizbicki opened 2 months ago

mikeizbicki commented 2 months ago

I'm migrating a question from https://github.com/mikeizbicki/docsum/issues/1#issue-2512941315

I think I have finished the homework but I noticed in my tests.yml testing that the declaration.txt.attack doesn't "attack" correctly anymore and just outputs a summary of the text. Is this because we are summarizing the documents in chunks now and the program is more resistant to prompt injections or have I done something wrong?

You're correct that the recursive summarizing is causing the attack to fail. It is most likely still working in the first round, and causing the bad summary. But then in the second round of summarization there is no longer an "attack prompt", and so the summarization goes through like normal. A more sophisticated attack would be needed to break this recursive summarization.