Open msharifbd opened 2 days ago
Hi Sharif, good to hear from you.
I'll be adding a basic 10-K, 10-KSB, and 10-Q item parser to the package this week. It will be able to parse Item 1C from 2001-present. (Current benchmark is ~15 milliseconds per filing)
Ok. Thanks
The following code should work.
from datamule import Filing, Downloader
import os
import json
from pathlib import Path
# Download 10-K filings from 2023-12-14 to 2024-10-28 (Today's date)
downloader = Downloader()
downloader.download(form='10-K', file_types='10-K',output_dir='10k',date=('2023-12-14','2024-10-28'))
Path('10k_parsed').mkdir(exist_ok=True)
# Collect the text of Item 1C from each 10-K filing
item1c_list = []
for filename in [f for f in os.listdir('10k') if f.lower().endswith(('.htm','.html','.txt'))]:
print(filename)
filepath = os.path.join('10k', filename)
filing = Filing(filepath, filing_type='10-K')
parsed = filing.parse_filing()
try:
item1c = [item['text'] for item in parsed['content'][0]['items'] if item['title'].lower() == 'item1c'][0]
item1c_list.append(item1c)
except:
# Some filings are missing Item 1C or the parser failed to extract it
item1c_list.append('')
# OPTIONAL: Save parsed data to a json file
# with open(f'10k_parsed/{filename}.json', 'w') as f:
# json.dump(parsed, f)
Heads up, I will probably change the structure of the parsed 10-K in the future, as it's unintuitive.
Updated to make the structure intuitive.
parsed['document']['part1']['item1c']
result
Item
1C. Cybersecurity
Risk
Management and Strategy
We
have established cybersecurity risk assessment procedures to ensure effectiveness in cybersecurity management, strategy and governance
and reporting cybersecurity risks. The process is in alignment with our strategic objectives and risk appetite....
Hello, Thanks. I got the following error from the below code -
# Collect the text of Item 1C from each 10-K filing
item1c_list = []
for filename in [f for f in os.listdir('10k') if f.lower().endswith(('.htm','.html','.txt'))]:
print(filename)
filepath = os.path.join('10k', filename)
filing = Filing(filepath, filing_type='10-K')
parsed = filing.parse_filing()
try:
item1c = [item['text'] for item in parsed['content'][0]['items'] if item['title'].lower() == 'item1c'][0]
item1c_list.append(item1c)
except:
# Some filings are missing Item 1C or the parser failed to extract it
item1c_list.append('')
The error -
000000695123000041_amat-20231029.htm
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[21], [line 8](vscode-notebook-cell:?execution_count=21&line=8)
[5](vscode-notebook-cell:?execution_count=21&line=5) filepath = os.path.join('10k', filename)
[7](vscode-notebook-cell:?execution_count=21&line=7) filing = Filing(filepath, filing_type='10-K')
----> [8](vscode-notebook-cell:?execution_count=21&line=8) parsed = filing.parse_filing()
[9](vscode-notebook-cell:?execution_count=21&line=9) try:
[10](vscode-notebook-cell:?execution_count=21&line=10) item1c = [item['text'] for item in parsed['content'][0]['items'] if item['title'].lower() == 'item1c'][0]
File c:\Users\mshar\AppData\Local\Programs\Python\Python312\Lib\site-packages\datamule\sec_filing.py:14, in Filing.parse_filing(self)
[13](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/sec_filing.py:13) def parse_filing(self):
---> [14](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/sec_filing.py:14) self.data = self.parser.parse_filing(self.filename, self.filing_type)
[15](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/sec_filing.py:15) return self.data
File c:\Users\mshar\AppData\Local\Programs\Python\Python312\Lib\site-packages\datamule\parser\sec_parser.py:18, in Parser.parse_filing(self, filename, filing_type)
[16](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/parser/sec_parser.py:16) return parse_8k(filename)
[17](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/parser/sec_parser.py:17) else:
---> [18](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/parser/sec_parser.py:18) data = parse_textual_filing(url=filename, return_type='json')
[19](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/parser/sec_parser.py:19) return data
File c:\Users\mshar\AppData\Local\Programs\Python\Python312\Lib\site-packages\datamule\datamule_api.py:10, in parse_textual_filing(url, return_type)
[8](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/datamule_api.py:8) response = requests.get(base_url,params=params)
[9](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/datamule_api.py:9) if response.status_code != 200:
---> [10](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/datamule_api.py:10) raise ValueError('Server error')
[12](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/datamule_api.py:12) if return_type == 'simplify':
[13](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/datamule_api.py:13) # return as html
[14](file:///C:/Users/mshar/AppData/Local/Programs/Python/Python312/Lib/site-packages/datamule/datamule_api.py:14) return response.text
ValueError: Server error
I think the issue is with parse_filing()
function.
Hi Sharif, did you see the previous comment? You need to swap out parsed
with
parsed['document']['part1']['item1c']
I changed the syntax to make it more intuitive, sorry for the confusion. Next time I'll edit the original comment directly.
Hi John: I talked to you about the same issue while discussing sec-parsers. I am wondering how I can get item 1C of 10-K for all companies since December 14, 2023 using datamule. I tried different code from datamule, but was unsuccessful. Thanks