Open india-kerle opened 1 year ago
cool I've addressed your comments @georgerichardson and run the flow in production for the years 2007 and 2017 - let me know if you think I should make final changes
do let me know if it's ok to merge @georgerichardson !
Description
This PR adds a patent citations collection pipeline. The output (i.e. .json where each key is a patent id and each value is a list of patent ids) should be in identical format to OpenAlex so as to calculate the consolidation-distruption (CD) index.
I've also gotten rid of the temporary AI genomics patents getters that aren't being used anymore.
Fixes # (issue)
This PR closes #39
Instructions for Reviewer
The main script to run that collects forward and backward citations is:
python dap_aria_mapping/pipeline/data_collection/paents_citations.py run --production=False
To test the flow:
pytest dap_aria_mapping/pipeline/data_collection/tests/test_patents_citations.py
i've ultimately decided to threshold based on years (2007 and 2017) and not citation types because I think if we wanted to threshold based on types, we really need to talk to a lawyer or patents expert. I tried to google around for fleshed out definitions (a bit more context here - https://docs.google.com/spreadsheets/d/1LtfjECVI5pqqwE7oMw1JbwFcUWhUoHgJH_mJ0flw9Fw/edit#gid=1193216467) and also reached out to edward but he doesn't know too much about citation types.
Please pay special attention to ...
Checklist:
notebooks/
pre-commit
and addressed any issues not automatically fixeddev
README
s