problem parsing XML data for mdf version 4

danielhrisca commented 6 years ago

Hello Aymeric,

in the new test file for some channels the comment contains multiple \x00 bytes at the end:

pprint(comment)
{'block_len': 288,
 'id': b'##MD',
 'links_nr': 0,
 'reserved0': 0,
 'text': b'<CNcomment><TX>C</TX>\n<extensions>\n<extension>\n<display xmlns="w'
         b'ww.vector.com/mdf4">C1</display>\n</extension>\n</extensions>\n</CN'
         b'comment>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
         b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
         b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
         b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
         b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
         b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
         b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
         b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
         b'\x00\x00\x00\x00'}

This causes and exception in the lxml parsing and the comment is set to None.

Stripping the null bytes fixes this.

ratal commented 6 years ago

Hi Daniel, Weird, null bytes stripping should already be done in _mdfblockreadBYTE() function in mdfinfo4.py. Something else went wrong ?

danielhrisca commented 6 years ago

Hello Aymeric,

you're right that this is not actually the cause.

For some reason the __findTX xpath is not working:

def extractXmlField(self, xml_tree, find):
        """ Extract Xml field from a xml tree

        Parameters
        ----------------
        xml_tree : xml tree from xml.etree.ElementTree
        field : str

        Returns
        -----------
        field value in xml tree
        """
        try:
            ret = find(xml_tree)
            if ret:
                ret = ret[0].text
            else:
                if etree.tostring(xml_tree) is not None:
                    print(ret, etree.tostring(xml_tree), xml_tree.find('TX').text)
                ret = None
            return ret
        except:
            print('problem parsing metadata', file=stderr)
            return None

this gives me :

[] b'<CNcomment><TX>C</TX>\n<extensions>\n<extension>\n<display xmlns="www.vector.com/mdf4">C340</display>\n</extension>\n</extensions>\n</CNcomment>' C

ratal commented 6 years ago

I changed XPath string, removed the '/' in last commit, seems to be better working

danielhrisca commented 6 years ago

I get the same result as before

[] b'<CNcomment><TX>C</TX>\n<extensions>\n<extension>\n<display xmlns="www.vector.com/mdf4">C25</display>\n</extension>\n</extensions>\n</CNcomment>' C
[] b'<CNcomment><TX>C</TX>\n<extensions>\n<extension>\n<display xmlns="www.vector.com/mdf4">C25</display>\n</extension>\n</extensions>\n</CNcomment>' C
[] b'<CNcomment><TX>C</TX>\n<extensions>\n<extension>\n<display xmlns="www.vector.com/mdf4">C25</display>\n</extension>\n</extensions>\n</CNcomment>' C
[] b'<CNcomment><TX>C</TX>\n<extensions>\n<extension>\n<display xmlns="www.vector.com/mdf4">C25</display>\n</extension>\n</extensions>\n</CNcomment>' C
[] b'<CNcomment><TX>C</TX>\n<extensions>\n<extension>\n<display xmlns="www.vector.com/mdf4">C26</display>\n</extension>\n</extensions>\n</CNcomment>' C
[] b'<CNcomment><TX>C</TX>\n<extensions>\n<extension>\n<display xmlns="www.vector.com/mdf4">C26</display>\n</extension>\n</extensions>\n</CNcomment>' C

ratal commented 6 years ago

same as #114 switched from etree/xpath to objectify, should be better working. Please check.

ratal / mdfreader

problem parsing XML data for mdf version 4 #99