Introduces a pipeline for retrieving NSF award data and a ZipDataset class for handling ZIP archives. The pipeline fetches award data using the NSF API, while ZipDataset enables accessing ZIP files in S3 and fetch all filenames there. These filenames correspond to award IDs and are used to call the API.
Key Points
NSF Data Collection: Fetches and processes NSF award information.
ZIP Dataset Handling: Directly reads file names from ZIP archives, supporting both local and cloud storage. Note that the custom dataset can be easily modified to handle reading files, rather than just the names.
NSF data fetching and ZIP dataset integration
Description
Introduces a pipeline for retrieving NSF award data and a
ZipDataset
class for handling ZIP archives. The pipeline fetches award data using the NSF API, whileZipDataset
enables accessing ZIP files in S3 and fetch all filenames there. These filenames correspond to award IDs and are used to call the API.Key Points