greedo / python-xbrl

xbrl parser written in Python :bulb:
https://pypi.python.org/pypi/python-xbrl
Apache License 2.0
220 stars 76 forks source link

Parsing the first matching value #26

Open artemk93 opened 9 years ago

artemk93 commented 9 years ago

I am using Arelle app to open xml files and double check the output from the code below

from xbrl import XBRLParser, GAAP, GAAPSerializer

xbrl_parser = XBRLParser()
xbrl = xbrl_parser.parse('aapl-20150627.xml')
gaap_obj = xbrl_parser.parseGAAP(xbrl, doc_date='20150627', context='current', ignore_errors = 0)
serializer = GAAPSerializer()
result = serializer.dump(gaap_obj)

print result

Output:

MarshalResult(data={u'liabilities': 65285.0, u'net_cash_flows_financing_continuing': 0.0, u'revenue': 0.0, u'income_tax_expense_benefit': 3796.0, u'common_shares_authorized': 0.0, u'income_from_equity_investments': 0.0, u'preferred_stock_dividends': 0.0, u'redeemable_noncontrolling_interest': 0.0, u'extraordary_items_gain_loss': 0.0, u'temporary_equity': 0.0, u'costs_and_expenses': 0.0, u'non_current_assets': 4081.0, u'net_cash_flows_discontinued': 0.0, u'net_cash_flows_investing_discontinued': 0.0, u'liabilities_and_equity': 273151.0, u'other_operating_income': 0.0, u'operating_income_loss': 0.0, u'income_before_equity_investments': 0.0, u'net_income_parent': 0.0, u'equity': 0.0, u'income_loss': 14083.0, u'cost_of_revenue': 0.0, u'operating_expenses': 5598.0, u'noncurrent_liabilities': 0.0, u'current_liabilities': 0.0, u'net_cash_flows_investing': 0.0, u'stockholders_equity': 125677.0, u'net_income_loss': 10677.0, u'net_cash_flows_investing_continuing': 0.0, u'nonoperating_income_loss': 0.0, u'common_shares_outstanding': 0.0, u'net_cash_flows_financing': 0.0, u'net_income_shareholders': 0.0, u'comprehensive_income': 9065.0, u'equity_attributable_interest': 0.0, u'commitments_and_contingencies': 0.0, u'comprehensive_income_parent': 9065.0, u'net_cash_flows_operating_discontinued': 0.0, u'comprehensive_income_interest': 0.0, u'other_comprehensive_income': 0.0, u'equity_attributable_parent': 0.0, u'assets': 3991.0, u'common_shares_issued': 0.0, u'gross_profit': 19681.0, u'net_cash_flows_operating_continuing': 0.0, u'current_assets': 0.0, u'interest_and_debt_expense': 0.0, u'net_income_loss_noncontrolling': 0.0, u'net_cash_flows_operating': 0.0}, errors={})

The problem is that every value is the first matching value in the xml file. So liabilities = 65285.0, is actually us-gaap:LiabilitiesCurrent, which comes before us-gaap:Liabilities. Same thing with assets = 3991.0 is actually us-gap:FiniteLivedIntangibleAssetsAccumulatedAmortization, which comes before us-gaap:Assets = 273 151 000 000.

I believe it can be solved by slightly changing part of def parseGAAP() in xbrl.py where xbrl.find_all is used for every value (assets, current_assets, etc)

artemk93 commented 9 years ago

In xbrl.py inside function def parseGAAP() liabilities = xbrl.find_all(name=re.compile("(us-gaap:liabilities$)", re.IGNORECASE | re.MULTILINE)) seems to solve the problem. Same thing for assets or any other tag name

artemk93 commented 9 years ago

I've been working with the code, changing it slightly to look for values that I want to (I hope that is ok). If it helps I can post it here, I can also post the output that I get.

greedo commented 8 years ago

@artemk93 if you think the changes would be valuable to other people, go ahead a submit a PR with what you have and we can work on it. Thanks!

pwsutherland commented 7 years ago

Thanks for the parser, @greedo greedo.

I am planning to follow in @artemk93's footsteps and make piecemeal changes but wanted to check if there was any update on you and @artemk93 work? It does not look like he/she ever actually submitted a PR

greedo commented 7 years ago

No progress on it yet @artemk93 and would gladly welcome your contributions.