beingkk commented 1 year ago

Closes #64

The main addition is the pipeline for calculating a topic-level novelty score, characterising the "uncommonness" of the research related to a given topic (in a given year).

The usage is as follows:

python dap_aria_mapping/pipeline/novelty/openalex_topic_novelty.py

This will output five tables (on per each taxonomy level), with two alternative novelty scores per topic, per year.

At the moment, this is based only on the OpenAlex data.

In a forthcoming issue/PR, I will apply the same analysis on patent data, to generate novelty scores using patents as well.

Checklist:

[x] I have refactored my code out from notebooks/
[x] I have checked the code runs
[x] I have tested the code
[x] I have run pre-commit and addressed any issues not automatically fixed
[x] I have merged any new changes from dev
[x] I have documented the code
- [x] Major functions have docstrings
- [ ] Appropriate information has been added to READMEs
[x] I have explained this PR above
[x] I have requested a code review

beingkk commented 1 year ago

Thanks a lot @ampudia19, will implement your suggestions!

beingkk commented 1 year ago

Thanks again @ampudia19, I fixed the issues highlighted above, namely:

[x] Indicated “optional” for all optional variables in getters.novelty.py and adding defaults in the doctstring
[x] Improved docstrings in throughout novelty_utils.py, calculate_openalex_novelty.py and openalex_topic_novelty.py
[x] Fixing more upstream the issue with duplicated “work_id” on OpenAlex data, so that no duplicated papers are saved when calculating novelty.
[x] Adding an upload_to_s3 variable to the pipelines (default will be True).

beingkk commented 1 year ago

Hope this is OK to merge @ampudia19 ?

Re topic names: I'm happy to add an adjustment to use chatgpt topic names via another issue #71 (perhaps once you've merged the corresponding PR). Hope that's alright?

ampudia19 commented 1 year ago

All is looking good, Karlis :)

nestauk / dap_aria_mapping

[64] Pipeline script for topic-level novelty #66

Checklist: