Hello,
I used grobid to covert 2657 pdf files in xml and then with this command #!python -m paperetl.file /Users/kellytsorb/paperetl/file/XML_files /Users/kellytsorb/paperetl/SQLite
I insert the xml files into database that this comand creates but only 549 of these are inserted and I don't know why because in the past some of the papers that aren't inserted now I tried a smaller number of them and they were okk. Is there a limitation of number of articles that I can insert into database?
Process Process-2:
Traceback (most recent call last):
File "/Users/kellytsorb/anaconda3/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/Users/kellytsorb/anaconda3/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, *self._kwargs)
File "/Users/kellytsorb/anaconda3/lib/python3.11/site-packages/paperetl/file/execute.py", line 94, in process
for result in Execute.parse(params):
File "/Users/kellytsorb/anaconda3/lib/python3.11/site-packages/paperetl/file/execute.py", line 74, in parse
yield TEI.parse(stream, source)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kellytsorb/anaconda3/lib/python3.11/site-packages/paperetl/file/tei.py", line 37, in parse
title = soup.title.text
^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'text'
Total articles inserted: 549
Hello, I used grobid to covert 2657 pdf files in xml and then with this command #!python -m paperetl.file /Users/kellytsorb/paperetl/file/XML_files /Users/kellytsorb/paperetl/SQLite I insert the xml files into database that this comand creates but only 549 of these are inserted and I don't know why because in the past some of the papers that aren't inserted now I tried a smaller number of them and they were okk. Is there a limitation of number of articles that I can insert into database?
Process Process-2: Traceback (most recent call last): File "/Users/kellytsorb/anaconda3/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/Users/kellytsorb/anaconda3/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs) File "/Users/kellytsorb/anaconda3/lib/python3.11/site-packages/paperetl/file/execute.py", line 94, in process for result in Execute.parse(params): File "/Users/kellytsorb/anaconda3/lib/python3.11/site-packages/paperetl/file/execute.py", line 74, in parse yield TEI.parse(stream, source) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/kellytsorb/anaconda3/lib/python3.11/site-packages/paperetl/file/tei.py", line 37, in parse title = soup.title.text ^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'text' Total articles inserted: 549
Thank you in advance!