nghiavt2906 / Web-Scraping-BE

0 stars 0 forks source link

[Feature] Asynchronous processing of keywords #7

Open olivierobert opened 1 year ago

olivierobert commented 1 year ago

Issue

Upon uploading keywords, the scraping will be performed synchronously in the controller:

https://github.com/nghiavt2906/Web-Scraping-BE/blob/57c918214c75686543944fa0368f7b94fed78368/src/controllers/report.controller.js#L9-L13

https://github.com/nghiavt2906/Web-Scraping-BE/blob/57c918214c75686543944fa0368f7b94fed78368/src/services/report.js#L7-L28

The request can take a long time to complete (and potentially could exceed the HTTP request timeout). In addition, if any errors occur for any keyword, the processing of keywords will stop.

Warning Disabling a feature in the test environment should be avoided. Instead, a mock should be used in unit tests.

Expected

The benefits are:

nghiavt2906 commented 1 year ago
nghiavt2906 commented 1 year ago

Also I have noticed that uploading the file in my deployment link is slow due to network connection. The hosting platform has some issues with chronium as well so it cannot run the scraping properly. I think it's best to test the application in local computer.

olivierobert commented 1 year ago

Thank you for your clarification. However, there is no observability in this critical process 😅 That's where a queue system such as Bull would be needed.

In addition, the processing of keywords should not be done in a loop to ensure that the failure related to one keyword does not stop the processing of the other keywords. So there needs to be a separate asynchronous process for each keyword.