ryansmccoy / py-sec-edgar

Python application used to download, parse, and extract structured/unstructured data from filings in the SEC Edgar Database (including 10-K, 10-Q, 13-D, S-1, 8-K, etc.)
Other
98 stars 16 forks source link

Some question about program #13

Open robinzixuan opened 2 years ago

robinzixuan commented 2 years ago
  1. Can it handle the 13F data?
  2. If I only have the cik number, whether I search with cik rather than ticker?
ryansmccoy commented 2 years ago

Good questions! lol I need to look.

1) It should be able to get the 13F forms and I think I was able to get the data out at some point, but I'd have to look

2) i need to check filtering CIK...

robinzixuan commented 2 years ago

Thanks, I think if we need to get the 13 forms, we might need to use the cik to get them. So it might highly depend on the filtering CIK

ryansmccoy commented 2 years ago

Gotcha, that make sense because a lot of the funds don't have tickers...

From what I see there isn't a way to filter by CIK, but wouldn't be that tough to add... Do you want to try to give it a shot?

You could follow the same pattern I used for tickers and make one for CIK...

https://github.com/ryansmccoy/py-sec-edgar/blob/127166b8a27dbd80f52fb8b73a19a9aa942bbb62/py_sec_edgar/__main__.py#L35

you could add something like:


@click.command()
@click.option('--ticker-list', default=CONFIG.TICKER_LIST_FILEPATH)
@click.option('--cik-ticker-list', default=CONFIG.CIK_LIST_FILEPATH)
@click.option('--form-list', default=True)
def main(ticker_list, form_list, cik_ticker_list):
      ...

      if cik_list_filter:
              cik_ticker_list = pd.read_csv(CONFIG.CIK_LIST_FILEPATH, header=None).iloc[:, 0].tolist()
              df_cik_tickers = df_cik_tickers[df_cik_tickers['CIK'].isin(cik_ticker_list )]
robinzixuan commented 2 years ago

Thanks, I fixed it. One more problem, I found the sec form structure changed after 2011, whether the form before 2012 could not be extracted?

ryansmccoy commented 2 years ago

If you submit a pull request, I'll add the code to the project and you can be a contributor (if you want).

Regarding the 2011 version, can you share an example so I can see what you mean?

robinzixuan commented 2 years ago

`2022-04-06 04:48:00,691 INFO py_sec_edgar.extract: extracting documents to /sec_gov/Archives/edgar/data/861439/000091205794003991 /root/py-sec-edgar/py_sec_edgar/parse/header.py:50: FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead. header_dict = header_dict.replace('', pd.np.nan) Traceback (most recent call last): File "/root/anaconda3/envs/py-sec-edgar/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/root/anaconda3/envs/py-sec-edgar/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/anaconda3/envs/py-sec-edgar/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/root/py-sec-edgar/py_sec_edgar/main.py", line 87, in main() File "/root/anaconda3/envs/py-sec-edgar/lib/python3.8/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/root/anaconda3/envs/py-sec-edgar/lib/python3.8/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/root/anaconda3/envs/py-sec-edgar/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/root/anaconda3/envs/py-sec-edgar/lib/python3.8/site-packages/click/core.py", line 760, in invoke return __callback(args, **kwargs) File "/root/py-sec-edgar/py_sec_edgar/main.py", line 81, in main filing_broker.process(sec_filing) File "/root/py-sec-edgar/py_sec_edgar/process.py", line 51, in process filing_content = self.extract(filing_filepaths) File "/root/py-sec-edgar/py_sec_edgar/extract.py", line 28, in extract filing_contents = extract_complete_submission_filing(filing_json['filing_filepath'], output_directory=filing_json['extracted_filing_directory']) File "/root/py-sec-edgar/py_sec_edgar/extract.py", line 74, in extract_complete_submission_filing filing_header = header_parser(raw_text) File "/root/py-sec-edgar/py_sec_edgar/parse/header.py", line 52, in header_parser header_dict[1] = header_dict[1].ffill().bfill().tolist() File "/root/anaconda3/envs/py-sec-edgar/lib/python3.8/site-packages/pandas/core/frame.py", line 3505, in getitem indexer = self.columns.get_loc(key) File "/root/anaconda3/envs/py-sec-edgar/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc raise KeyError(key) from err KeyError: 1`

When I run the CIK of 861439, which is a company of AMERICAN MEDICAL HOLDINGS INC