ltenzil / scrape

Scrape talks about building reports on user entered keywords and its search results (Google)
0 stars 0 forks source link

Keywords are processed synchronously #10

Open longnd opened 2 years ago

longnd commented 2 years ago

Hi Lourdino 👋 Thank you for the effort spent on this code challenge. I'm Long Nguyen, a software developer from Nimble, we already had the chance to meet during the interview and I would like to conduct the feedback via Gitlab Issues regarding your code challenge. Keep in mind, this is a bidirectional process, so I would love to hear back from you as well, so please do not hesitate to raise your question/correction in case I missed anything. If we are aligned on any issue here and you would like to correct them, please go ahead and open a Pull Request per fix, merge it when you feel confident and I will follow up on these fixes 😇 I hope you find the process enjoyable. Good luck and happy coding 🤘


The first issue I notice is the synchronous keywords processing: while extracting the logic to do the search for each keyword and extract the result to another class (model/Keyword) is the right move, the way the keywords are processed via a loop synchronously isn't ideal https://github.com/ltenzil/scrape/blob/6eb163261a9102f748fc789147840f7856584ef8/app/models/keyword.rb#L42-L51

This process will:

It has several issues

  1. Since this process is called directly in the controller, the processing will be synchronous and very slow if the user uploads 100 keywords.
  2. No error handling mechanism if it fails to process a keyword: the remaining keywords will be lost and the user will never have search reports for them.
  3. It violates the single responsibility principle as it does many things. So there is an opportunity to split the business logic.

The tricky part of the code challenge is the actual processing of keywords. The upload, saving the keywords, and their processing are three distinct actions that must be handled separately as any of them can succeed or fail for different purposes. From an end-user perspective, the file upload must complete fast, and then the results for reach keyword can be made available later on.

ltenzil commented 2 years ago

I never thought of async processing, will try it . it's insightful.

longnd commented 2 years ago

while the solution to use thread for each search made in #26 would improve the processing time, it still isn't a real async way and doesn't solve the other issues raised in the comment. The controller still calls the service to do the search for all keywords and needs to wait. Ideally, the application can use a queue & a background job to do the searches: