Why Bayes? - Githubissues

mac-r commented 11 years ago

"Uses Naive Bayes Classifiers to classify financial statement items"

I can not figure out, why do we need a classifier here. Could you explain this feature in more details, please?

jimlindstrom commented 11 years ago

XRBL does a good job preserving the raw information in SEC filings. If you simply want to measure values that are evident in the filings--e.g. Total assets, net income over cash from operations, etc--you can ignore the classifiers.

Many metrics used in analyzing a company's financials require the financial statements to be restated. For example, calculating a firm's composition ratio (net operating assets over common shareholders' equity) requires that one classify each of a firm's assets as either operating or financing. That information is not included in SEC filings. The judgement required to perform that kind of classification is, however, fairly minimal, and can be easily performed by supervised machine learning approaches, with nonzero but low error rates. I chose to hand-label a bunch of financial statements and then use Bayes classifiers to learn how to classify similar items. There are more sophisticated approaches one could take, but this has worked well enough for my purposes.

mac-r commented 11 years ago

@jimlindstrom Why can not assets be categorized at once and then be looked for in further documents? Are reports so different in their structure? I can not figure out why we need Machine Learning here in general.

jimlindstrom commented 11 years ago

Each company is free to use slightly different wording in how it labels items in its filings. So it isn't possible to enumerate all possible items (once, offline) and classify them ahead of time. We need some way to classify items on-the- fly, while evaluating a given filing.

And unfortunately, the grammar of financial statements is such that it is not possible to infer an item's purpose/meaning/interpretation based solely on structure. Within the left side of a balance sheet, for instance, assets that are financing are intermingled with those that are operating, with no regularity or pattern. The structure helps, for sure, though. E.g, after the tax item in an income statement, we can no longer find revenue items.

So a combination of text classification (using bayes) and structural analysis (using Viterbi) is used to infer the maximum likelihood parsing of a financial statement.

mac-r commented 11 years ago

Cool!

This library is really nice. I am developing a new patch for Ajaila now (https://github.com/ajaila/ajaila). It's an application for Data Science in Ruby. There will be a good financial extension pretty soon. FinModeling is a perfect fit! Combining such libraries as yours one with Ajaila leads to a sound synergy effect.

Don't you know any stable solution for portfolio construction and management in Ruby? What would you like to include in Ajaila::Finance package?

jimlindstrom commented 11 years ago

Hey Max,

Thanks.

I haven't played much with portfolio construction/management. I have done a couple data science-ey investigations in the course of building this gem:

Looking at historical regressions of a company's returns vs. market returns, in order to calculate beta: See: https://github.com/jimlindstrom/FinModeling/blob/master/lib/finmodeling/capm.rb
Looking at the classifier errors for the various bayes classifiers used in the gem. See: https://github.com/jimlindstrom/FinModeling/blob/master/spec/income_statement_item_spec.rb
I've been meaning to get fancier about trying to look at the context of a company (is it unprofitable, but early-stage, and picking up steam? is it a stable company with predicable returns? is it heavily biased toward fixed vs. variable costs?) and use that info to make better forecasts. For now I'm doing pretty dumb simple stuff: https://github.com/jimlindstrom/FinModeling/blob/master/lib/finmodeling/company_filings.rb#L107
In evaluating whether a company's finances are stable enough to base predictions off of, I'm looking at linear regressions of recent performance and evaluating the regression stats. Again, not super sophisticated: https://github.com/jimlindstrom/FinModeling/blob/master/lib/finmodeling/income_statement_analyses.rb#L19

If any of those look like things that could farmed out to Ajaila, I'd be happy to talk about doing so.

jbl

jimlindstrom commented 11 years ago

FYI: I ran across this today, which looks amazing: http://pandodaily.com/2013/04/02/want-to-take-on-wall-street-quantopians-algorithmic-trading-platform-now-accepts-outside-data-sets/

mac-r commented 11 years ago

The app is down, the didn't expect so much traffic. :)

mac-r commented 11 years ago

Their backtesting interface rocks!

Screenshot from 2013-04-03 01:04:40

jimlindstrom / FinModeling

Why Bayes? #2