This PR contains preliminary functionality for writing feed items as complete documents to S3, closing #5.
Done
[x] Created a new organization in S3 with various IAMs for app development
[x] Created a bucket for baleen ingestion and set up all the credentials as local environment vars
[x] Added a store module that has structs for AWS credentials and for documents to be written to S3
[x] Handled irregular feeds including those that don't have the correct date (since we're using date to determine filenames in s3) and those that have malformed XML
[x] Implemented preliminary functionality inside fetch module to retrieve full text of articles from XML URL provided by feed (resolves #7)
[x] Added a levelDB manifest (resolves #8) so we don't re-fetch documents we already have
Proposed follow-on tasks
validate encoding (see #9)
upgrade configuration methodology (more like what we're doing for the messaging backend?)
put script on a schedule (currently requires a person to kick it off) & deploy as a lambda?
This PR contains preliminary functionality for writing feed items as complete documents to S3, closing #5.
Done
store
module that has structs for AWS credentials and for documents to be written to S3fetch
module to retrieve full text of articles from XML URL provided by feed (resolves #7)Proposed follow-on tasks