Write a python script to perform the following. Eventually the plan is to run this python code in this lambda function:
Download the sample CORE data dump file from here. Eventually this will need to use the full data dump, but probably easiest to start with the sample since the full dump is ~400gb and 1.8tb once fully extracted. For more information about the format of these dump files see the CORE documentation.
Extract contents
Process all the individual .json files containing information about individual CORE articles, and assign a randomly selected label to them from the Biomimicry Functions list in GDrive. You can just store the functions list in a python data structure, you don't have to write code to extract it from the google sheet. This code will go away once the labeller is working.
Write a python script to perform the following. Eventually the plan is to run this python code in this lambda function: