greenelab / pubtator

Retrieve and process PubTator annotations
Other
43 stars 9 forks source link

Error while starting pubtator locally #31

Open gdilieto-unimib opened 2 years ago

gdilieto-unimib commented 2 years ago

Hello, I followed the instructions in the readme, in order to run pubtator locally. But when I have to execute the last command python execute.py --config config_files/pubtator_central_config.json after the repository is downloaded, I get an error that may be related to the fact that the extraction of the downloaded file doesn't go well. I append the error message: Article that broke: 35401401 228155it [3:57:28, 19.85it/s]Traceback (most recent call last): File "execute.py", line 43, in <module> convert_pubtator( File "/home/gabbo/Tesi/pubtator/scripts/pubtator_to_xml.py", line 181, in convert_pubtator for article in tqdm.tqdm(article_generator): File "/home/gabbo/anaconda3/envs/pubtator/lib/python3.8/site-packages/tqdm/_tqdm.py", line 833, in __iter__ for obj in iterable: File "/home/gabbo/Tesi/pubtator/scripts/pubtator_to_xml.py", line 146, in read_bioconcepts2pubtator_offsets g = list(g) File "/home/gabbo/Tesi/pubtator/scripts/pubtator_to_xml.py", line 141, in <genexpr> lines = (line.rstrip() for line in f) File "/home/gabbo/anaconda3/envs/pubtator/lib/python3.8/gzip.py", line 305, in read1 return self._buffer.read1(size) File "/home/gabbo/anaconda3/envs/pubtator/lib/python3.8/_compression.py", line 68, in readinto data = self.read(len(byte_view)) File "/home/gabbo/anaconda3/envs/pubtator/lib/python3.8/gzip.py", line 487, in read uncompress = self._decompressor.decompress(buf, size) zlib.error: Error -3 while decompressing data: invalid code lengths set

Could you please check this problem, and let me know if a fresh install works?

Thank you.

danich1 commented 2 years ago

Greetings. Before I get down into the weeds of this problem, you might want to directly download the xml files here. This might save you time and effort if your main goal is to have converted xml files. Otherwise, it seems to be that the issue is the downloaded file may be corrupted.