rotationalio / baleen

An automated ingestion service of RSS feeds to construct a corpus for NLP research.
https://ensign.rotational.dev/examples/data_scientists/
GNU Affero General Public License v3.0
9 stars 3 forks source link

Writing feeds as documents to s3 #12

Closed rebeccabilbro closed 5 years ago

rebeccabilbro commented 5 years ago

This PR contains preliminary functionality for writing feed items as complete documents to S3, closing #5.

Done

Proposed follow-on tasks

rebeccabilbro commented 5 years ago

FYI, current stats in S3: ~Total size: 167.2 MB Total objects: 1455~ Total size: 304.4 MB Total objects: 2719