hmsiccbl / screensaver1

Screensaver Version 1 - End of life on December 1, 2018
GNU General Public License v2.0
3 stars 3 forks source link

Eutils xml parser not working with pubmed publication queries #197

Open seanderickson opened 6 years ago

seanderickson commented 6 years ago

eutils utility is broken with a change in the pubmed xml response:

edu.med.harvard.screensaver.util.eutils.EutilsUtills.java returns error:

[Fatal Error] :1:50: White spaces are required between publicId and systemId. org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) at edu.harvard.med.screensaver.util.eutils.EutilsUtils.getDocumentFromInputStream(EutilsUtils.java:194) at edu.harvard.med.screensaver.util.eutils.EutilsUtils.getXMLForEutilsQuery0(EutilsUtils.java:173) at edu.harvard.med.screensaver.util.eutils.EutilsUtils.getXMLForEutilsQuery(EutilsUtils.java:159) at edu.harvard.med.screensaver.util.eutils.PublicationInfoProvider.getPubmedInfo(PublicationInfoProvider.java:68)

when trying to access pubmed publications. Problem is described here: https://stackoverflow.com/questions/6514158/white-spaces-are-required-between-publicid-and-systemid

solution: update to use a more robust parser, or switch to using json output, which is simpler to parse:

https://www.ncbi.nlm.nih.gov/entrez/eutils/fcgi/esummary.fcgi?retmode=xml&tool=screensaver&email=screensaver-feedback%40hms.harvard.edu&db=pubmed&id=20653081 should be:

https://www.ncbi.nlm.nih.gov/entrez/eutils/fcgi/esummary.fcgi?retmode=json&tool=screensaver&email=screensaver-feedback%40hms.harvard.edu&db=pubmed&id=20653081