openai / deeptype

Code for the paper "DeepType: Multilingual Entity Linking by Neural Type System Evolution"
https://arxiv.org/abs/1802.01021
Other
647 stars 147 forks source link

TypeError: 'NoneType' object is not iterable when running full_preprocess.sh #60

Open bhedayat opened 4 years ago

bhedayat commented 4 years ago

Hello I am currently running the command below with traceback shown. Do I need to start over and delete enwiki-latest-pages-articles.xml? I've had to start and stop this script once because I was not in the correct environment where I had tensorflow and could not find wikidata_linker_utils. But once I was in the correct environment it got past that issue.

My ultimate goal is to simply run this notebook (https://github.com/openai/deeptype/blob/master/learning/SentencePredictions.ipynb) So I assume i need to download the data and train the model.

Any help would be appreciated. Thank you.

sh extraction/full_preprocess.sh ${DATA_DIR} en

Downloading wikidata into data/.
Will prepare language: en
Creating data directory
Done.
Downloading and preparing Wikidata:
Already compressed Wikidata
Done with wikidata.
Preparing language: en
Already downloaded and extracted enwiki-latest-pages-articles.xml.

Process Process-9:
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/epub_conversion/wiki_decoder.py", line 323, in convert_wiki_to_lines_inner_queue
    for res in convert_wiki_to_lines_inner_generator(wiki, *args):
TypeError: 'NoneType' object is not iterable
Denescor commented 4 years ago

Hello,

I have exactly the same issue when I try to process the enwiki-latest-pages-articles.xml with the full_preprocess.sh :

Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "[path]/.local/lib/python3.5/site-packages/epub_conversion/wiki_decoder.py", line 323, in convert_wiki_to_lines_inner_queue
    for res in convert_wiki_to_lines_inner_generator(wiki, *args):
TypeError: 'NoneType' object is not iterable

So, I will appreciate any help. thank you

ghost commented 4 years ago

I met the same problem and was able to fix it by installing the oldest compatible version of the required modules stated in requirements.txt

cssselect==0.9.1
epub-conversion==1.0.7
lxml==3.4.3
msgpack-python==0.4.8
numpy==1.16.0
pandas==0.15.2
progressbar2==3.6.0
requests==2.6.0
tensorflow==1.4.0
wikipedia-ner==0.0.23
ciseau==1.0.1
Cython==0.26
marisa-trie==0.7.2
zbeloki commented 3 years ago

Thanks @hyukyu. Anyway, to avoid this error it's not necessary to downgrade all the requirements. The problematic library is epub-conversion, and the latest compatible version seems to be 1.0.9:

epub-conversion==1.0.9