Closed ambarishK closed 5 years ago
Thank you Please reformat into monospace ( use <pre> tags).
mbarish123@ubuntu:~$ ami-dictionary create --input https://en.wikipedia.org/wiki/Institutes_of_National_Importance --informat wikipage --dictionary Research_institute4 --outformats xml Generic values (AMIDictionaryTool) ================================ basename null cproject ctree cTreeList null dryrun false excludeBase null excludeTrees null file types [] forceMake false includeBase null includeTrees null log4j logfile null verbose 0 Specific values (AMIDictionaryTool) ================================ dataCols null dictionary [Research_institute4] dictionaryTop null href null hrefCols null input https://en.wikipedia.org/wiki/Institutes_of_National_Importance informat wikipage dictInformat null linkCol null log4j null nameCol null operation create outformats [xml] splitCol , termCol null terms null wikiLinks [wikipedia, wikidata] 0 [main] DEBUG org.contentmine.ami.tools.AMIDictionaryTool - extracting hyperlinks ..................!.!.!.!.!.!.!.....................!...................................................................................................................................................................................................!..!.............................................................................................................................................[Fatal Error] :302:913: The entity name must immediately follow the '&' in the entity reference. Exception in thread "main" java.lang.RuntimeException: cannot parse/read stream: at org.contentmine.eucl.xml.XMLUtil.parseQuietlyToDocument(XMLUtil.java:1176) at org.contentmine.eucl.xml.XMLUtil.parseQuietlyToRootElement(XMLUtil.java:1164) at org.contentmine.ami.tools.AMIDictionaryTool.addWikipedia(AMIDictionaryTool.java:786) at org.contentmine.ami.tools.AMIDictionaryTool.addWikipediaPage(AMIDictionaryTool.java:766) at org.contentmine.ami.tools.AMIDictionaryTool.addWikiLinks(AMIDictionaryTool.java:745) at org.contentmine.ami.tools.AMIDictionaryTool.createDictionaryListInRandomOrder(AMIDictionaryTool.java:733) at org.contentmine.ami.tools.AMIDictionaryTool.addEntriesToDictionaryElement(AMIDictionaryTool.java:717) at org.contentmine.ami.tools.AMIDictionaryTool.writeNamesAndLinks(AMIDictionaryTool.java:685) at org.contentmine.ami.tools.AMIDictionaryTool.createDictionary(AMIDictionaryTool.java:524) at org.contentmine.ami.tools.AMIDictionaryTool.runDictionary(AMIDictionaryTool.java:408) at org.contentmine.ami.tools.AMIDictionaryTool.runSpecifics(AMIDictionaryTool.java:397) at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:218) at org.contentmine.ami.tools.AMIDictionaryTool.main(AMIDictionaryTool.java:361) Caused by: nu.xom.ParsingException: The entity name must immediately follow the '&' in the entity reference. at line 302, column 913 at nu.xom.Builder.build(Unknown Source) at nu.xom.Builder.build(Unknown Source) at org.contentmine.eucl.xml.XMLUtil.parseQuietlyToDocument(XMLUtil.java:1174) ... 12 more Caused by: org.xml.sax.SAXParseException; lineNumber: 302; columnNumber: 913; The entity name must immediately follow the '&' in the entity reference. at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) ... 15 more ###### Error while creating dictionary for indian institutes.
This is almost certainly a bug.
The raw text contains an &
which is not converted. I will open another issue
ambarish123@ubuntu:~$ ami-dictionary create --input https://en.wikipedia.org/wiki/Institutes_of_National_Importance --informat wikipage --dictionary Research_institute4 --outformats xml
Generic values (AMIDictionaryTool)
basename null cproject
ctree
cTreeList null dryrun false excludeBase null excludeTrees null file types [] forceMake false includeBase null includeTrees null log4j
logfile null verbose 0
Specific values (AMIDictionaryTool)
dataCols null dictionary [Research_institute4] dictionaryTop null href null hrefCols null input https://en.wikipedia.org/wiki/Institutes_of_National_Importance informat wikipage dictInformat null linkCol null log4j null nameCol null operation create outformats [xml] splitCol , termCol null terms null wikiLinks [wikipedia, wikidata] 0 [main] DEBUG org.contentmine.ami.tools.AMIDictionaryTool - extracting hyperlinks ..................!.!.!.!.!.!.!.....................!...................................................................................................................................................................................................!..!.............................................................................................................................................[Fatal Error] :302:913: The entity name must immediately follow the '&' in the entity reference. Exception in thread "main" java.lang.RuntimeException: cannot parse/read stream: at org.contentmine.eucl.xml.XMLUtil.parseQuietlyToDocument(XMLUtil.java:1176) at org.contentmine.eucl.xml.XMLUtil.parseQuietlyToRootElement(XMLUtil.java:1164) at org.contentmine.ami.tools.AMIDictionaryTool.addWikipedia(AMIDictionaryTool.java:786) at org.contentmine.ami.tools.AMIDictionaryTool.addWikipediaPage(AMIDictionaryTool.java:766) at org.contentmine.ami.tools.AMIDictionaryTool.addWikiLinks(AMIDictionaryTool.java:745) at org.contentmine.ami.tools.AMIDictionaryTool.createDictionaryListInRandomOrder(AMIDictionaryTool.java:733) at org.contentmine.ami.tools.AMIDictionaryTool.addEntriesToDictionaryElement(AMIDictionaryTool.java:717) at org.contentmine.ami.tools.AMIDictionaryTool.writeNamesAndLinks(AMIDictionaryTool.java:685) at org.contentmine.ami.tools.AMIDictionaryTool.createDictionary(AMIDictionaryTool.java:524) at org.contentmine.ami.tools.AMIDictionaryTool.runDictionary(AMIDictionaryTool.java:408) at org.contentmine.ami.tools.AMIDictionaryTool.runSpecifics(AMIDictionaryTool.java:397) at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:218) at org.contentmine.ami.tools.AMIDictionaryTool.main(AMIDictionaryTool.java:361) Caused by: nu.xom.ParsingException: The entity name must immediately follow the '&' in the entity reference. at line 302, column 913 at nu.xom.Builder.build(Unknown Source) at nu.xom.Builder.build(Unknown Source) at org.contentmine.eucl.xml.XMLUtil.parseQuietlyToDocument(XMLUtil.java:1174) ... 12 more Caused by: org.xml.sax.SAXParseException; lineNumber: 302; columnNumber: 913; The entity name must immediately follow the '&' in the entity reference. at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) ... 15 more
Error while creating dictionary for indian institutes.