While extracting citations from the hewiki dumps of 2019/05/01, the following error occurs:
$ mwcites extract /mnt/data/xmldatadumps/public/hewiki/20190501/hewiki-20190501-pages-meta-history*.xml*.bz2 > hewiki-20190501-citations.tsv
Traceback (most recent call last):
File "/srv/home/bmansurov/venv/mwcites/bin/mwcites", line 11, in <module>
sys.exit(main())
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mwcites/mwcites.py", line 49, in main
module.main(sys.argv[2:])
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mwcites/utilities/extract.py", line 58, in main
run(dump_files, extractors)
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mwcites/utilities/extract.py", line 65, in run
for page_id, title, rev_id, timestamp, type, id in cites:
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/map.py", line 87, in map
Failed while processing dump '/mnt/data/xmldatadumps/public/hewiki/20190501/hewiki-20190501-pages-meta-history1.xml-p13702p18009.bz2':
Traceback (most recent call last):
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/processor.py", line 35, in run
for out in self.process_dump(dump, path):
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mwcites/utilities/extract.py", line 94, in process_dump
for cite in extract_cite_history(page, extractors):
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mwcites/utilities/extract.py", line 116, in extract_cite_history
for revision in page:
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/page.py", line 72, in load_revisions
yield Revision.from_element(sub_element)
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/revision.py", line 99, in from_element
values = consume_tags(cls.TAG_MAP, element)
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/util.py", line 7, in consume_tags
value_map[tag_name] = tag_map[tag_name](sub_element)
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/revision.py", line 20, in <lambda>
'contributor': lambda e: Contributor.from_element(e),
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/contributor.py", line 40, in from_element
values = consume_tags(cls.TAG_MAP, element)
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/util.py", line 7, in consume_tags
value_map[tag_name] = tag_map[tag_name](sub_element)
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/contributor.py", line 14, in <lambda>
'id': lambda e: int(e.text),
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
re_raise(error, path)
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/map.py", line 12, in re_raise
raise error
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
Failed while processing dump '/mnt/data/xmldatadumps/public/hewiki/20190501/hewiki-20190501-pages-meta-history1.xml-p6536p13701.bz2':
Traceback (most recent call last):
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/processor.py", line 35, in run
for out in self.process_dump(dump, path):
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mwcites/utilities/extract.py", line 94, in process_dump
for cite in extract_cite_history(page, extractors):
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mwcites/utilities/extract.py", line 116, in extract_cite_history
for revision in page:
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/page.py", line 72, in load_revisions
yield Revision.from_element(sub_element)
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/revision.py", line 99, in from_element
values = consume_tags(cls.TAG_MAP, element)
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/util.py", line 7, in consume_tags
value_map[tag_name] = tag_map[tag_name](sub_element)
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/revision.py", line 20, in <lambda>
'contributor': lambda e: Contributor.from_element(e),
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/contributor.py", line 40, in from_element
values = consume_tags(cls.TAG_MAP, element)
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/util.py", line 7, in consume_tags
value_map[tag_name] = tag_map[tag_name](sub_element)
File "/srv/home/bmansurov/venv/mwcites/lib/python3.5/site-packages/mw/xml_dump/iteration/contributor.py", line 14, in <lambda>
'id': lambda e: int(e.text),
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
While extracting citations from the hewiki dumps of 2019/05/01, the following error occurs: