petermr / docanalysis

Semantic analysis of text documents including sentence and paragraph splitting
Apache License 2.0
12 stars 3 forks source link

errors when using a dictionary to create csv output #30

Open EmanuelFaria opened 1 year ago

EmanuelFaria commented 1 year ago

INFO: Loading scispacy 0%| | 0/17142 [00:00<?, ?it/s]/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/scispacy/abbreviation.py:216: UserWarning: [W036] The component 'matcher' does not have any patterns defined. global_matches = self.global_matcher(doc) 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17142/17142 [06:22<00:00, 44.80it/s] Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.8/bin/docanalysis", line 8, in sys.exit(main()) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/docanalysis/docanalysis.py", line 196, in main calldocanalysis.handlecli() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/docanalysis/docanalysis.py", line 188, in handlecli self.entity_extraction.extract_entities_from_papers(args.project_name, args.dictionary, search_sections=args.search_section, entities=args.entities, query=args.query, hits=args.hits, File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/docanalysis/entity_extraction.py", line 170, in extract_entities_from_papers compiled_terms = self.get_terms_from_ami_xml(terms_xml_path[i]) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/docanalysis/entity_extraction.py", line 563, in get_terms_from_ami_xml tree = ET.parse(xml_path) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/xml/etree/ElementTree.py", line 1202, in parse tree.parse(source, parser) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/xml/etree/ElementTree.py", line 595, in parse self._root = parser._parse_whole(source) xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 701, column 249