rivernews / review-scraper-java-development-environment

An environment to develop review scraper
0 stars 1 forks source link

Data Pipeline: aggregation work as cronjob across all orgs #25

Open rivernews opened 4 years ago

rivernews commented 4 years ago

The way we store reviews - as separate, fragment individual file in S3 is just optimized for scraper.

However, in order to do data-dense analytics, data structure needs to come in chunk in order to scale. This duality can be complete by adding a cronjob which goes across all orgs and aggregate their review objects into a single tsv file, which is optimized for data-dense operation.

This is likely to get support from slack middleware service.