manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
116 stars 46 forks source link

Date parsing fails #47

Closed mrx23dot closed 3 years ago

mrx23dot commented 3 years ago

Parsing the following 2 URLs give date parsing exceptions.

Are they violating the standard or the lib should be able to handle them? (although it would be risky guessing the date)

CODI time data 'Dec 31' does not match format '%B %d' url = 'https://www.sec.gov/Archives/edgar/data/0001345126/000134512621000014/codi-20210331.htm'

MFA time data 'Dec 31' does not match format '%B %d' url = 'https://www.sec.gov/Archives/edgar/data/0001055160/000105516021000007/mfa-20210331.htm'

Traceback (most recent call last):
  File "C:\tmp\small_test.py", line 12, in <module>
    inst = XbrlParser(cache).parse_instance(url)
  File "C:\python36\lib\site-packages\xbrl\instance.py", line 626, in parse_instance
    return parse_ixbrl_url(url, self.cache)
  File "C:\python36\lib\site-packages\xbrl\instance.py", line 363, in parse_ixbrl_url
    return parse_ixbrl(instance_path, cache, instance_url)
  File "C:\python36\lib\site-packages\xbrl\instance.py", line 424, in parse_ixbrl
    fact_value: str or float = _extract_ixbrl_value(fact_elem)
  File "C:\python36\lib\site-packages\xbrl\instance.py", line 495, in _extract_ixbrl_value
    parsed_date = strptime(fact_elem.text, '%B %d')
  File "C:\python36\lib\_strptime.py", line 559, in _strptime_time
    tt = _strptime(data_string, format)[0]
  File "C:\python36\lib\_strptime.py", line 362, in _strptime
    (data_string, format))
ValueError: time data 'Dec 31' does not match format '%B %d'
manusimidt commented 3 years ago

You are correct, this is a bug. The Facts where the parser fails use the value format: ixt:datemonthdayen. According to the Specification the the format "Accepts months in full or abbreviated form, with non-numeric separator" (https://www.xbrl.org/Specification/inlineXBRL-transformationRegistry/REC-2011-07-31/inlineXBRL-transformationRegistry-REC-2011-07-31.html#sec-ixt-11).

This means the value could either be 'December 31' or 'Dec 31', the latter is currently not supported by the libary. I will implement a fix.