Updated Parsing Data with Form 13 data and graph data model
Showing use of Text Bison for Parsing management and filing info. Then demonstrating how to load into Neo4j
TODO:
Move Data From Local Directory to Cloud Storage - and rework the code in the example to pull from there
Why Python on 3.8 kernels is necessary? (see the top of parsing notebook - change text directions here? Or do we need to use that kernel)?
Data model alignment:
Filtering: Currently not filtering to common stocks and holdings over $10M
Aggregation: Currently not aggregating holdings over report quarter, result will be lots of parallel relationships (hundreds times more), particularly for larger institutions like Blackrock and Fidelity.
Field selection and Address entities
why address? Can we really support that? Is it worth it?
Do we need the extra shares type and investment discretion fields?
Data typing: Currently not storing report quarters as dates.
Cik ids for managers?: In more recent models I have used cik id as the uniqueness constraint. For natural language queries it may be useful to use a full-text index on manager name instead of a Range index anyway.
NER with Other FIlings: Switch to loading form13 data programmatically, and use unstructured data, like 10-k filing, with LLM for knowledge graph construction - showing power of Neo4j KG with NER and inspiring with best practice usage patterns. Will likely be easier and less complex once implemented.
Aligning Rest of Part 3 to parse-data notebook
Build on and clean up markdown
Work through story around Sampling?: i.e. are we loading all data at the end or just sticking with a sample?
Work Completed
TODO: