nestauk / dap_aria_mapping

Mapping technology innovation to support The Advanced Research and Innovation Agency (ARIA)
MIT License
1 stars 0 forks source link

39 collect patent citations #48

Open india-kerle opened 1 year ago

india-kerle commented 1 year ago

Description

This PR adds a patent citations collection pipeline. The output (i.e. .json where each key is a patent id and each value is a list of patent ids) should be in identical format to OpenAlex so as to calculate the consolidation-distruption (CD) index.

I've also gotten rid of the temporary AI genomics patents getters that aren't being used anymore.

Fixes # (issue)

This PR closes #39

Instructions for Reviewer

The main script to run that collects forward and backward citations is:

python dap_aria_mapping/pipeline/data_collection/paents_citations.py run --production=False

To test the flow:

pytest dap_aria_mapping/pipeline/data_collection/tests/test_patents_citations.py

i've ultimately decided to threshold based on years (2007 and 2017) and not citation types because I think if we wanted to threshold based on types, we really need to talk to a lawyer or patents expert. I tried to google around for fleshed out definitions (a bit more context here - https://docs.google.com/spreadsheets/d/1LtfjECVI5pqqwE7oMw1JbwFcUWhUoHgJH_mJ0flw9Fw/edit#gid=1193216467) and also reached out to edward but he doesn't know too much about citation types.

Please pay special attention to ...

Checklist:

india-kerle commented 1 year ago

cool I've addressed your comments @georgerichardson and run the flow in production for the years 2007 and 2017 - let me know if you think I should make final changes

india-kerle commented 1 year ago

do let me know if it's ok to merge @georgerichardson !