Insufficient solution to handle core business logic

Issue

There are some limitation in the architecture for the core business logic (scrapping process)

it is handled synchronously, i.e. the keywords are processed sequentally https://github.com/jounng23/scraping-keyword-web/blob/b71b469cfc46aa6122e80984df56808e2b477011/backend/pkg/keyword/repository.go#L80-L88

it has some drawbacks

it is hard to scale as we can't process multiple keywords at the same time
it is error-prone, if the crawlKeywordResult() function crash when processing one of the keyword, the remaining ones will not be processe
the function CrawlKeywordResults() itself block the application from showing the uploaded keywords to the users as it must wait for the scrapping process to be completed. If the user upload a long list of CSV file (e.g. 150 keywords), it is potentially that the timeout for the request will occurs

-> another better solution could be

parse & save the keywords to DB, each with a status, e.g. initialized, processing, completed
show the list of uploaded keywords and their statuses to the user
enqueue each keyword as a background job
having one (or multiple) worker(s) to pick up the jobs to process

Also, the decision to use Chromedp to manupulate the search isn't effective as launching the chrome browser is slow & resource-intensive. -> A simpler solution could be using a HTTP client to send the search request directly, and then parsing the response. A simple CURL command could demonstrate the idea

curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" https://www.google.com/search\?q\=nodejs

junnd23 / scraping-keyword-web

Insufficient solution to handle core business logic #6

Issue