Closed limostrom closed 4 months ago
Hi @limostrom, we also saw a problem in some other item sections similar to the one you said. We fixed those with PR #21, which probably should also fix your problem. (credits to @Bailefan).
If the thing still occurs, please reopen the issue.
Hi - I'm trying to work with the extracted Item 1 Business Descriptions, and I noticed that for many filings the extract_items.py code appears to be cutting off the section prematurely. This usually happens when the text makes a reference to a later section (e.g. Item 1A Risk Factors), where that reference is interpreted by the code to be the header of the next section. One example is this filing: https://www.sec.gov/Archives/edgar/data/872448/000087244813000005/atml-201210k.htm When the code gets to this sentence from "Forward Looking Statements": "... including the risk factors set forth in this discussion and in Item 1A — Risk Factors, and elsewhere in this Form 10-K." it cuts off item_1 at "including the risk factors set forth in this discussion and in" The issue occurs in approximately 10-15% of the filings I've looked at.
I am very inexperienced at working with text data, so I'm not sure how to fix this problem myself. Please let me know if you need more information or if I can help in any way. Thanks!