macropusgiganteus / scrappy-web

Project for Technical Test
0 stars 0 forks source link

[Feature][UI] As a user, my file upload is fast #12

Open malparty opened 3 weeks ago

malparty commented 3 weeks ago

Issue

Currently, when a user uploads a CSV file, the response is only returned once ALL the keywords are uploaded (which can take some time). Users do not like to wait, and they probably don't need to wait to see the first keywords 🙂

Expected

Once the keywords are persisted in the DB, the scrapping work should be run in the background. The user could receive a message File uploaded with success, the scrapping has started. to be informed about the situation, without locking the application.

The next step (though not required) is to either send an email or a UI notification when the parsing of a file is completed.

macropusgiganteus commented 3 weeks ago

Thank you for your feedback.

So, the expected behavior should be the following?

sequenceDiagram
participant view
participant controller
participant db
participant google
view ->> controller: submit CSV file
controller ->> controller: read CSV file and get keywords
controller ->> db: persist keywords in the DB
controller ->> view: notify user with message "File uploaded with success, the scrapping has started."
loop every keyword
controller -->> google: scrap keyword search result
activate google
google -->> controller: search result
controller ->> db: persist search result
controller ->> view: update UI with new keyword
end
controller ->> view: notify user that the file parsing process is completed.

** I think we could scrap multiple search result simultaneously (using threads) to make the upload process faster.

If that so, I wonder what if we couldn't get the search result from google (getting blocked) or an error happened, what should we do about the persisted keywords that doesn't have the search result data? Should I add a delete button or some refresh button for each keyword? image I scrap all the search results and persist each of them and the keyword at the same time because I'm not sure how to handle the keyword without search result.

malparty commented 3 weeks ago

So, the expected behavior should be the following? [diagram]

Yes, that's a good recaps. I would say that the last 2 lines (about update UI & notify) is open to many alternatives (from just refreshing the page, to sending notification with a link, from partial UI update, etc...). Especially the "trigger" has many different possibilities (long pull, interval query, user refresh, websocket, etc...).

If that so, I wonder what if we couldn't get the search result from google (getting blocked) or an error happened, what should we do about the persisted keywords that doesn't have the search result data?

Maybe displaying the status as In progress would make it clear for the user why the result data is still blank? 💭 But that's just one suggestion among other possibilities ;-)

Should I add a delete button or some refresh button for each keyword?

Well, as a user, what would you expect? :) Assuming you went to sleep and came back the day after, maybe you might expect the application to have tried again for you. Only after several re-try and fail then we might want to change the keyword status from Retrying to Failed.

macropusgiganteus commented 3 weeks ago

Thank you for your feedback.

I think adding the statuses In-progress, Retrying, Success and Failed is great! So each status can have a different behavior. Thanks for the suggestion! :)

Q1: What should we do about the persisted keywords that doesn't have the search result data? Q2: Well, as a user, what would you expect?

I would expect that if the keyword doesn't have the search result data, it should show status Failed and we can re-upload the file to get the search result from google again. This way we don't have to click the refresh button for each keyword with Failed status which there could be a lot of them.

maybe you might expect the application to have tried again for you.

That makes sense. Currently, persisted keywords are skipped while uploading the file regardless of the search result status. We can process persisted keywords while uploading the file again if they have Failed status. For the re-try strategy, maybe using Exponential backoff can help reducing the error rate.