Open mateee12 opened 7 years ago
Snazi sa prekonvertovat cislo strany na int a evidentne head[:-1] nestaci pre urcenie integeru velkosti strany - tj. OCR nie je tak konzistentne ako sme si mysleli :D http://stackoverflow.com/questions/6903557/splitting-on-first-occurrence
Solution: int(head.split('.', 1)[1]) na miesto int(head[:-1]) skus a napis
Hej, presne toto je chyba, samozrejme na taketo bugy sa dojde az v produkcii ked sa parsuje velke mnozstvo, prosimta Adam, urob pull request s opravou len tejto veci, ja to hned schvalim a spustim parsovanie nove. DK
pustil som cely rocnik 1941 na servery, po dlhom parsovani to padlo pri tomto subore, tuto je Stacktrace:
{'header_config': '/var/lib/deep_search_docs_2/slovak_1336-4464/slovak_config.json', 'xml': '/var/lib/deep_search_docs_2/slovak_1336-4464/1336-4464_1941/19411205/XML/1336-4464_1941_19411205_00001.xml', 'pdf': None}
Loaded Files:
{'json': '/var/lib/deep_search_docs_2/slovak_1336-4464/slovak_config.json', 'xml': '/var/lib/deep_search_docs_2/slovak_1336-4464/1336-4464_1941/19411205/XML/1336-4464_1941_19411205_00001.xml', 'journal_marc21': '/var/lib/deep_search_docs_2/slovak_1336-4464/journal_marc21.xml', 'dir': '/var/lib/deep_search_docs_2/slovak_1336-4464/1336-4464_1941/19411205'} Issue created, index: deep_search_prod, type: issue, id: AVt-daLNADYbJb4KNhEx Traceback (most recent call last): File "elastic_filler.py", line 72, in
main(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4])
File "elastic_filler.py", line 65, in main
issue_id = semantic.save_to_elastic(name, file['dir'], file)
File "/var/www/deep_search/python_app/helper/elastic_filler.py", line 94, in save_to_elastic
max_font = max([int(head[:-1]) for head in heading_sizes] or [0])
File "/var/www/deep_search/python_app/helper/elastic_filler.py", line 94, in
max_font = max([int(head[:-1]) for head in heading_sizes] or [0])
ValueError: invalid literal for int() with base 10: '10.'