nestauk / dap_aria_mapping

Mapping technology innovation to support The Advanced Research and Innovation Agency (ARIA)
MIT License
1 stars 0 forks source link

[5] Reformat Entities pipeline #26

Closed Jack-Vines closed 1 year ago

Jack-Vines commented 1 year ago

Description

Adds the pipeline to strip out the unnecessary variables from the annotated patents/publications. Includes test for added functions. I purposefully haven't included getters as I'm not sure these files are useful for analysis (but the post processed files will)

Instructions for Reviewer

Run python dap_aria_mapping/pipeline/entity_processing/reformat_entities.py --datastore=s3 run to run the flow in test mode Run pytest on the test file

Checklist:

THE ONLY CHANGED FILES ARE dap_aria_mapping/pipeline/entity_processing/reformat_entities.py AND dap_aria_mapping/pipeline/entity_processing/tests/test_reformat_entities.py