manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
111 stars 42 forks source link

Add support for iXBRL fact-format ixt-sec:numwordsen #53

Closed manusimidt closed 2 years ago

manusimidt commented 3 years ago

As defined in https://www.sec.gov/info/edgar/edgarfm-vol2-v50.pdf

<ix:nonFraction 
  unitRef="U_xbrlipure" 
  id="F_000636" 
  name="us-gaap:StockholdersEquityNoteStockSplitConversionRatio1" 
  contextRef="C_0001318605_20200810_20200810" 
  decimals="0" 
  format="ixt-sec:numwordsen">
    five
</ix:nonFraction>
mrx23dot commented 3 years ago

An example: LongTermDebt expected 50000000, got fifty thousand

in https://www.sec.gov/Archives/edgar/data/0001620533/000162053320000070/shak-20200624.htm https://www.sec.gov/Archives/edgar/data/0001620533/000162053320000070/shak-20200624.htm

source:

<ix:nonfraction unitr=
ef=3D"usd" contextref=3D"if9be73d89b434ea3be9d616285ce8b5e_I20200624" decim=
als=3D"INF" format=3D"ixt-sec:numwordsen" name=3D"us-gaap:LongTermDebt" sca=
le=3D"3" id=3D"id3VybDovL2RvY3MudjEvZG9jOjc5OTc4NDI2MmNlNDQ2MWU4ZjAwZmJkYTQ=
1MWFmNGNmL3NlYzo3OTk3ODQyNjJjZTQ0NjFlOGYwMGZiZGE0NTFhZjRjZl83My9mcmFnOjM4YW=
Y5NjQzMGUzMDQ2OTZiMjY2NzMzZDdlNDQwNTg0L3RleHRyZWdpb246MzhhZjk2NDMwZTMwNDY5N=
mIyNjY3MzNkN2U0NDA1ODRfMzg_6b4b4ede-c607-4e7b-9d04-71f0322a05ac">fifty thou=
sand</ix:nonfraction>

simple look up values:

SINGLE_DIGITS = dict([(v,k) for k,v in enumerate('zero one two three four five six seven eight nine ten eleven twelve'.split())])
    if val in SINGLE_DIGITS:
      val = SINGLE_DIGITS[val]
    elif val.lower() in [u'☒', 'true', 'yes']:
      val = True
    elif val.lower() in [u'☐', 'false', 'no']:
      val = False
    elif val.lower() in [u'—', '-', 'nil', '', 'null']: # considered 0
      val = 0

Reference implementations for parsing complex ones, e.g. fifty thousand (search for numwordsen) https://github.com/nikita-sheremet-clearscale/arelle/blob/70102a22a987ca464607b5ecb05aba7357fafb1e/arelle/plugin/transforms/SEC/__init__.py

Test cases

ixt-sec:numwordsen nineteen hundred forty-four 1944
ixt-sec:numwordsen Seventy Thousand and one 70001
ixt-sec:numwordsen no 0
ixt-sec:numwordsen None 0
assert 1 == text2num("one")
assert 12 == text2num("twelve")
assert 72 == text2num("seventy two")
assert 300 == text2num("three hundred")
assert 1200 == text2num("twelve hundred")
assert 12304 == text2num("twelve thousand three hundred four")
assert 6000000 == text2num("six million")
assert 6400005 == text2num("six million four hundred thousand five")
assert 123456789012 == text2num("one hundred twenty three billion four hundred fifty six million seven hundred eighty nine thousand twelve")
assert 4000000000000000000000000000000000 == text2num("four decillion")
manusimidt commented 3 years ago

@mrx23dot Thank you for your example with text2num. I am currently working on implemnting the ixt-sec transformations. Should be ready by tomorrow moring.