Open 482c opened 1 year ago
Hi thanks for your interest in our work! Good catch in generate.py
. Thanks!
The IndexError in alice.py
is likely coming from changes in the GPT-3 API, though I can't confirm. It could help to investigate where that range error is coming from in the file itself and see if one of those input lists is perhaps empty? Assuming this stems from an API change, I don't have access at this point to dig into it.
I also met these two problems.
f.write(f"{response}\n")
response
here is a list
of sentences. (But in your error report it seems like a dict
.) So you can just print it out and see how you can extract the string to write. E.g., I changed it to:
with open(args.output_file, "a") as f:
for r in response:
f.write("- "+r)
f.write('\n')
I also met these two problems.
- IndexError This is caused by predicting things like '\n', ' ', or '<|endoftext|>', which can be solved by pre-defining some stopwords.
f.write(f"{response}\n")
response
here is alist
of sentences. (But in your error report it seems like adict
.) So you can just print it out and see how you can extract the string to write. E.g., I changed it to:with open(args.output_file, "a") as f: for r in response: f.write("- "+r) f.write('\n')
I also met the IndexError, but i don't know where and how to pre-defining these stopwords? Can you help me? Thanks!
same things happen in toxigen-hatebert. I' m trying to detect toxity of some sentences by using pretrained toxigen-hatebert, then the error occurs. I check my input size and vocab size, but gain nothing.
Hi @mmmency is this error only happening with toxigen-hatebert? This thread is about the ALICE method.
If you're just running into index errors with toxigen-hatebert, this thread discusses how you need to use the bert-base-uncased
tokenizer with toxigen-hatebert
Hi thanks for your interest in our work! Good catch in
generate.py
. Thanks!The IndexError in
alice.py
is likely coming from changes in the GPT-3 API, though I can't confirm. It could help to investigate where that range error is coming from in the file itself and see if one of those input lists is perhaps empty? Assuming this stems from an API change, I don't have access at this point to dig into it.
Hi @Thartvigsen ! I also encountered this error. The problem is that when text is generated, since we generate only one token, API may return the following response:
{text: '', index: 0, logprobs: null, finish_reason: 'stop'}
In such cases, outputs['choices'][i]['logprobs']['top_logprobs'] is an empty array, not a dictionary of possible tokens with their corresponding scores. Since the generation for this sentence was completed, the simplest solution that works great with the code, in my opinion, is to add in such cases to scores[i] not outputs['choices'][i]['logprobs']['top_logprobs'], but a placeholder option {'\n': 0.0, ' ': -100, '.': -100, '<|endoftext|>': -100, '\n': -100}
I made a pull request with the corresponding fix, I hope it will be helpful to everyone who would like to reproduce the code 😊
Thank you @arinakosovskaia!
Hey! Awesome paper and thank you for the open resources.
I am trying to reproduce
generate_text.ipynb
from thenotebooks
in Google Colab. The link in the notebook to Google Colab displayed an error so I created a duplicate here.Date Seen (06/05/2023)
Versions Python 3.10
Steps to Reproduce The bug occurred when calling
alice()
as shown in the notebook.The same thing happens with the command:
!python generate.py --input_prompt_file /content/drive/MyDrive/coding_projects/toxigen/prompts/neutral_black_1k.txt --language_model GPT3 --classifier RoBERTa --ALICE True --output_file test_file.txt --num_generations_per_prompt 10 --generation_mode neutral --endpoint_url https://api.openai.com/v1/engines/text-ada-001/completions --api_key <API-KEY>
There was a minor bug from
generate.py
, which can be resolved by rewriting the line tof.write(f"{response}\n")
.However, the main problem is IndexError and I am not sure how to fix it.