manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
111 stars 40 forks source link

Add support for Datetime in context duration. #73

Closed mrx23dot closed 2 years ago

mrx23dot commented 2 years ago

parsing cache/www.sec.gov/Archives/edgar/data/0000752642/000149315218003093/umh-20171231.xml parsing cache/www.sec.gov/Archives/edgar/data/0000888491/000114420418026912/ohi-20180331.xml give: error unconverted data remains: T00:00:00

maybe a str.split('T')[0] could help.

  File "C:\tmp\py-xbrl_orig\xbrl\instance.py", line 642, in parse_instance
    return parse_xbrl_url(url, self.cache)
  File "C:\tmp\py-xbrl_orig\xbrl\instance.py", line 277, in parse_xbrl_url
    return parse_xbrl(instance_path, cache, instance_url)
  File "C:\tmp\py-xbrl_orig\xbrl\instance.py", line 309, in parse_xbrl
    context_dir = _parse_context_elements(root.findall('xbrli:context', NAME_SPACES), root.attrib['ns_map'], taxonomy, cache)
  File "C:\tmp\py-xbrl_orig\xbrl\instance.py", line 541, in _parse_context_elements
    datetime.strptime(start_date.text.strip(), '%Y-%m-%d').date(),
  File "C:\Python37\lib\_strptime.py", line 577, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "C:\Python37\lib\_strptime.py", line 362, in _strptime
    data_string[found.end():])
  ValueError: unconverted data remains: T00:00:00
manusimidt commented 2 years ago

Yes, this is a bug. The filer specifies the xbrli:startDate and xbrli:endDate not in as a date but as a dateTime:

<xbrli:period>
  <xbrli:startDate>2015-01-01T00:00:00</xbrli:startDate>
  <xbrli:endDate>2015-03-31T22:00:00</xbrli:endDate>
</xbrli:period>

According to the XBRL Specification 2.1 this is valid. Although this is the first time I see that a filer uses this :D

https://www.xbrl.org/specification/xbrl-2.1/rec-2003-12-31/xbrl-2.1-rec-2003-12-31+corrected-errata-2013-02-20.html#_4.7.2

So yes, this is a Bug in py-xbrl.

mrx23dot commented 2 years ago

This would be resilient: 'str'.strip()[:10]

manusimidt commented 2 years ago

yes that would be a quickfix. But in the ideal case the parser should also parse the time (like in the endDate in the example). However therefore I would have to change all data types of the context's. But I will add this for the next major release.