javedqadruddin / SECFilingClassifier

Experiments with ML models to classify SEC filings
0 stars 0 forks source link

What are you classifying on? #1

Open Analect opened 7 years ago

Analect commented 7 years ago

@javedqadruddin ... I came across your experiments.ipynb file, running tensorflow against an SEC corpus. I'm probably not understanding your approach, but what are you actual classifying on? What questions would you seek to answer by modelling this corpus using tensorflow? Thanks.

javedqadruddin commented 7 years ago

Hey, sorry for slow reply, been on vacation. The question I'm seeking to answer here is whether it's feasible to use neural networks to classify different types of contracts. In order to do this, I needed a substantial dataset of contract-like text. Since there are thousands of SEC filings available for free on the SEC's website, I used those. I scraped 3 different types of SEC filings using this: https://github.com/javedqadruddin/EDGAR

So, I classified three classes: 10-K, 10-Q, and 8-K filings.

Hope this helps, let me know if you have more questions.

Analect commented 7 years ago

@javedqadruddin .. thanks for the response. Just so I understand ... you used neural nets to classify the type of document (whether it was a 10-K, 10-Q or 8-K) ... without prior knowledge of those classifications? Did you have any success? What other sorts of questions do you think are feasible for this type of unstructured text? Thanks for your insights.