mcs07 / ChemDataExtractor

Automatically extract chemical information from scientific documents
http://chemdataextractor.org
MIT License
305 stars 113 forks source link

https breaks NlmXmlReader #31

Open maddenfederico opened 4 years ago

maddenfederico commented 4 years ago

In the NlmXmlReader class

    def detect(self, fstring, fname=None):
        """"""
        if fname and not (fname.endswith('.xml') or fname.endswith('.nxml')):
            return False
        if b'xmlns="http://jats.nlm.nih.gov/ns/archiving' in fstring:
            return True
        if b'JATS-archivearticle1.dtd' in fstring:
            return True
        if b'-//NLM//DTD JATS' in fstring:
            return True
        return False

The NLM's JATS namespace URI uses https now, so my document wasn't being registered as compatible with NlmXmlReader