Closed Jack-0-0 closed 5 months ago
Good stuff! A few comments, some on style, some on kedro functionalities.
Thank you for the review. I've made all the requested changes now 🙂
Thanks for your help in fixing the issue I had with S3 credentials. I've added a related note to the main README of the repo.
@ampudia19 I have resolved conflicts between this PR and main (after #16 was merged into it).
I've updated the data catalog to use the format f"{source}_{stage}_{description}"
.
Closes #13.
This pull request adds:
data_collection_oa
with functionality to retrieve OpenAlex works from concept ids and years from the OpenAlex API and then saves the data to S3. Thedata_collection_oa
dir containsnodes.py
andpipeline.py
files.oa_raw_works_for_concepts_and_years
to the kedro data catalog (conf/base/catalog.yml
).test_concept_ids
andtest_publication_years
parameters to theconf/base/parameters_data_collection_oa.yml
file which are used in the pipeline. We can update this file with the real values when we have agreed on the full list of concepts and years that we need.data_collection_oa
README explaining how to use this Kedro pipelinef"{source}_{stage}_{description}"