minimaxir / download-tweets-ai-text-gen

Python script to download public Tweets from a given Twitter account into a format suitable for AI text generation.
MIT License
219 stars 41 forks source link

Error while starting finetuning CSV #10

Open ghost opened 4 years ago

ghost commented 4 years ago

Hi there, I'm trying to train a .CSV of downloaded tweets. Whenever I try to start the finetuning process, I get a IndexError Out of Range error. I looked around and saw somebody seemed to have had the same kind of issue https://github.com/minimaxir/gpt-2-simple/issues/77

I get that same issue if I try running the finetuning again, but I guess it's because I need to restart Python everytime the finetuning ends, even if it doesn't really start. Restarting the runtime just takes me back to the original error, the out of range one.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:17: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/memory_saving_gradients.py:62: get_backward_walk_ops (from tensorflow.contrib.graph_editor.select) is deprecated and will be removed after 2019-06-06.
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint models/355M/model.ckpt
INFO:tensorflow:Restoring parameters from models/355M/model.ckpt
  0%|          | 0/1 [00:00<?, ?it/s]Loading dataset...

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-e6465260674d> in <module>()
      9               print_every=10,
     10               sample_every=500,
---> 11               save_every=500
     12               )

1 frames
/usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/load_dataset.py in load_dataset(enc, path, combine)
     37                 reader = csv.reader(fp)
     38                 for row in reader:
---> 39                     raw_text += start_token + row[0] + end_token + "\n"
     40         else:
     41             # Plain text

IndexError: list index out of range
lanekelly commented 4 years ago

I encountered this same error. I ran download_tweets.py on Windows, and the resulting CSV had blank lines between each row. The blank lines produce rows with length 0, thus the IndexError.

I removed the blank lines, which got things working. (Bless Notepad++ for this feature, I had a 40+ MB csv.)

This Stack Overflow post has more info on why the blank lines appear on Windows.

strombone-byte commented 4 years ago

Chupachis, did you resolve this issue? I'm also consistently running into the same one. As I'm not running the download script in Windows I haven't experienced lanekelly's error. Tks