john-friedman / datamule-python

A package to work with SEC data. Incorporates datamule endpoints.
MIT License
53 stars 5 forks source link

Item 1C from 10-K #4

Open msharifbd opened 2 days ago

msharifbd commented 2 days ago

Hi John: I talked to you about the same issue while discussing sec-parsers. I am wondering how I can get item 1C of 10-K for all companies since December 14, 2023 using datamule. I tried different code from datamule, but was unsuccessful. Thanks

john-friedman commented 2 days ago

Hi Sharif, good to hear from you.

I'll be adding a basic 10-K, 10-KSB, and 10-Q item parser to the package this week. It will be able to parse Item 1C from 2001-present. (Current benchmark is ~15 milliseconds per filing)

msharifbd commented 1 day ago

Ok. Thanks

john-friedman commented 1 day ago

The following code should work.

from datamule import Filing, Downloader
import os
import json
from pathlib import Path

# Download 10-K filings from 2023-12-14 to 2024-10-28 (Today's date)
downloader = Downloader()
downloader.download(form='10-K', file_types='10-K',output_dir='10k',date=('2023-12-14','2024-10-28'))

Path('10k_parsed').mkdir(exist_ok=True)

# Collect the text of Item 1C from each 10-K filing
item1c_list = []
for filename in [f for f in os.listdir('10k') if f.lower().endswith(('.htm','.html','.txt'))]: 
    print(filename)
    filepath = os.path.join('10k', filename)

    filing = Filing(filepath, filing_type='10-K')
    parsed = filing.parse_filing()
    try:
        item1c = [item['text'] for item in parsed['content'][0]['items'] if item['title'].lower() == 'item1c'][0]
        item1c_list.append(item1c)
    except:
        # Some filings are missing Item 1C or the parser failed to extract it
        item1c_list.append('')

    # OPTIONAL: Save parsed data to a json file
    # with open(f'10k_parsed/{filename}.json', 'w') as f:
    #     json.dump(parsed, f)
john-friedman commented 1 day ago

Heads up, I will probably change the structure of the parsed 10-K in the future, as it's unintuitive.

john-friedman commented 11 hours ago

Updated to make the structure intuitive.

parsed['document']['part1']['item1c']

result

Item
1C. Cybersecurity

Risk
Management and Strategy

We
have established cybersecurity risk assessment procedures to ensure effectiveness in cybersecurity management, strategy and governance
and reporting cybersecurity risks. The process is in alignment with our strategic objectives and risk appetite....
msharifbd commented 6 hours ago

Hello, Thanks. I got the following error from the below code -

# Collect the text of Item 1C from each 10-K filing
item1c_list = []
for filename in [f for f in os.listdir('10k') if f.lower().endswith(('.htm','.html','.txt'))]: 
    print(filename)
    filepath = os.path.join('10k', filename)

    filing = Filing(filepath, filing_type='10-K')
    parsed = filing.parse_filing()
    try:
        item1c = [item['text'] for item in parsed['content'][0]['items'] if item['title'].lower() == 'item1c'][0]
        item1c_list.append(item1c)
    except:
        # Some filings are missing Item 1C or the parser failed to extract it
        item1c_list.append('')

The error -

000000695123000041_amat-20231029.htm
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[21], [line 8](vscode-notebook-cell:?execution_count=21&line=8)
      [5](vscode-notebook-cell:?execution_count=21&line=5) filepath = os.path.join('10k', filename)
      [7](vscode-notebook-cell:?execution_count=21&line=7) filing = Filing(filepath, filing_type='10-K')
----> [8](vscode-notebook-cell:?execution_count=21&line=8) parsed = filing.parse_filing()
      [9](vscode-notebook-cell:?execution_count=21&line=9) try:
     [10](vscode-notebook-cell:?execution_count=21&line=10)     item1c = [item['text'] for item in parsed['content'][0]['items'] if item['title'].lower() == 'item1c'][0]

File c:\Users\mshar\AppData\Local\Programs\Python\Python312\Lib\site-packages\datamule\sec_filing.py:14, in Filing.parse_filing(self)
     [13](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/sec_filing.py:13) def parse_filing(self):
---> [14](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/sec_filing.py:14)     self.data = self.parser.parse_filing(self.filename, self.filing_type)
     [15](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/sec_filing.py:15)     return self.data

File c:\Users\mshar\AppData\Local\Programs\Python\Python312\Lib\site-packages\datamule\parser\sec_parser.py:18, in Parser.parse_filing(self, filename, filing_type)
     [16](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/parser/sec_parser.py:16)     return parse_8k(filename)
     [17](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/parser/sec_parser.py:17) else:
---> [18](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/parser/sec_parser.py:18)     data = parse_textual_filing(url=filename, return_type='json')
     [19](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/parser/sec_parser.py:19) return data

File c:\Users\mshar\AppData\Local\Programs\Python\Python312\Lib\site-packages\datamule\datamule_api.py:10, in parse_textual_filing(url, return_type)
      [8](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/datamule_api.py:8) response = requests.get(base_url,params=params)
      [9](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/datamule_api.py:9) if response.status_code != 200:
---> [10](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/datamule_api.py:10)     raise ValueError('Server error')
     [12](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/datamule_api.py:12) if return_type == 'simplify':
     [13](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/datamule_api.py:13)     # return as html
     [14](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/datamule_api.py:14)     return response.text

ValueError: Server error

I think the issue is with parse_filing() function.

john-friedman commented 4 hours ago

Hi Sharif, did you see the previous comment? You need to swap out parsed with

parsed['document']['part1']['item1c']

I changed the syntax to make it more intuitive, sorry for the confusion. Next time I'll edit the original comment directly.