miha42-github / company_dns

An open source micro-service focused that provides company data from EDGAR plus Wikipedia, and SIC lookup.
https://miha42-github.github.io/company_dns/
Apache License 2.0
9 stars 2 forks source link

Change the dictionary structure in edgar.get_all_details() #33

Open miha42-github opened 1 year ago

miha42-github commented 1 year ago

Introduction

In an effort to make companies, with the same CIK, report back correctly a format change was made in the name to put all of the names as uppercase without any punctuation. This is a weak implementation as companies could change their names slightly and cause a need to reformat the code again. A better approach is needed.

Proposed approach

For EDGAR the durable identifier is the Central Index Key (CIK). This identifier should be used instead of the name as the name can change even for public companies. The present code for temporarily tracking a company is:

            # If we've seen this company before then add the form, otherwise include both firmographics and the initial form definition
            if tmp_companies.get(company_name) == None:
                tmp_companies[company_name] = company_info
                tmp_companies[company_name]['forms'] = {accession_key: form}
            else:
                tmp_companies[company_name]['forms'][accession_key] = form

The proposed change could look something like this:

            # If we've seen this company before then add the form, otherwise include both firmographics and the initial form definition
            if tmp_companies.get(cik_no) == None:
                tmp_companies[cik_no] = company_info
                tmp_companies[cik_no]['forms'] = {accession_key: form}
            else:
                tmp_companies[cik_no]['forms'][accession_key] = form

Since company_info is a dict() that also keeps the companyName attribute the bookkeeping of the name is ok there. Because modules that make use of these data require a dict() keyed on companyName a function to rekey based upon companyName is needed. This function would loop over all cik_no keys, replace them with companyName and return a new dict(). The exact details of this change are left to the time of implementation.