[x] First install golang, before that install a version manager. GVM, but last maintenance is 2019, while having 6K stars. g seems neat, .3K stars, yet actively maintained. There're also gobrew, goup, around .1K stars.
[x] Use the built-in go mod to manage dependency, also best practice of naming the module.
[x] Allow sfc to call lambda
[x] SQS - While golang is powerful to spin up thousands of goroutine, we need to rate limit the request to avoid bombarding the website. SQS long poll max interval is 20s - way enough.
[x] Consumer lambda: concurrency control Turns out that we don't need this... at least the scraping part.
[x] Clean up original sfn starting
[x] SQS long polling troubleshooting - looks like did not wait, when sending 3 requests in a roll, consumer invoked 3 times almost at the same time, doesn't look at interval = 3 for long polling.
Even with concurrency limiting, long polling wait time, still consumer lambda invoked too fast, 10 within 4 sec.
But at least sleep in your consumer lambda can slow down the flow right? Given concurrency=1.
[x] Message not deleted after consumer lambda finishes -- because lambda default timeout is 3s! Increased to longer now
It's important to make above modularize, so it's better for testing, reusing, parallel execution, etc.
Module: scraper (Golang)
Module: parser & formatter
Module: word cloud reducer
Module: time series scraper
Module: publisher
Scraping - Golang runtime environment
Hierarchy for word cloud
Hierarchy for news stream
SNS ideas: fan in (url), fan out 1) for generating word cloud 2) for porting to fb
Let's split into two parts - 1) fetch html 2) rest of the processing