nelsonic / github-scraper

🕷 🕸 crawl GitHub web pages for insights we can't GET from the API ... 💡
426 stars 96 forks source link

Deployment? #13

Closed nelsonic closed 7 months ago

nelsonic commented 9 years ago

While deploying this to Heroku is feasible for test/staging purposes, the value of tracking GitHub Trends over time is in the _Data_ and subsequent _insights_... so we need to be able to deploy this to somewhere we can store Millions of records ...

nelsonic commented 9 years ago

Crawlers need to be on AWS because they have no limit on inbound bandwidth (i.e. we can suck in Terabytes and only keep a few Gigs worth of the data in our Redis/ES cluster. see: https://aws.amazon.com/blogs/aws/aws-lowers-its-pricing-again-free-inbound-data-transfer-and-lower-outbound-data-transfer-for-all-ser/

Digital Ocean has bandwidth caps (which we will reach rather quickly!)

nelsonic commented 7 months ago

GOTO: #128