Closed mac-r closed 11 years ago
XRBL does a good job preserving the raw information in SEC filings. If you simply want to measure values that are evident in the filings--e.g. Total assets, net income over cash from operations, etc--you can ignore the classifiers.
Many metrics used in analyzing a company's financials require the financial statements to be restated. For example, calculating a firm's composition ratio (net operating assets over common shareholders' equity) requires that one classify each of a firm's assets as either operating or financing. That information is not included in SEC filings. The judgement required to perform that kind of classification is, however, fairly minimal, and can be easily performed by supervised machine learning approaches, with nonzero but low error rates. I chose to hand-label a bunch of financial statements and then use Bayes classifiers to learn how to classify similar items. There are more sophisticated approaches one could take, but this has worked well enough for my purposes.
@jimlindstrom Why can not assets be categorized at once and then be looked for in further documents? Are reports so different in their structure? I can not figure out why we need Machine Learning here in general.
Each company is free to use slightly different wording in how it labels items in its filings. So it isn't possible to enumerate all possible items (once, offline) and classify them ahead of time. We need some way to classify items on-the- fly, while evaluating a given filing.
And unfortunately, the grammar of financial statements is such that it is not possible to infer an item's purpose/meaning/interpretation based solely on structure. Within the left side of a balance sheet, for instance, assets that are financing are intermingled with those that are operating, with no regularity or pattern. The structure helps, for sure, though. E.g, after the tax item in an income statement, we can no longer find revenue items.
So a combination of text classification (using bayes) and structural analysis (using Viterbi) is used to infer the maximum likelihood parsing of a financial statement.
Cool!
This library is really nice. I am developing a new patch for Ajaila now (https://github.com/ajaila/ajaila). It's an application for Data Science in Ruby. There will be a good financial extension pretty soon. FinModeling is a perfect fit! Combining such libraries as yours one with Ajaila leads to a sound synergy effect.
Don't you know any stable solution for portfolio construction and management in Ruby? What would you like to include in Ajaila::Finance package?
Hey Max,
Thanks.
I haven't played much with portfolio construction/management. I have done a couple data science-ey investigations in the course of building this gem:
If any of those look like things that could farmed out to Ajaila, I'd be happy to talk about doing so.
jbl
FYI: I ran across this today, which looks amazing: http://pandodaily.com/2013/04/02/want-to-take-on-wall-street-quantopians-algorithmic-trading-platform-now-accepts-outside-data-sets/
The app is down, the didn't expect so much traffic. :)
Their backtesting interface rocks!
"Uses Naive Bayes Classifiers to classify financial statement items"
I can not figure out, why do we need a classifier here. Could you explain this feature in more details, please?