Closed satyamtg closed 4 years ago
@satyamtg If the backend does not respond anymore, then the scraper should stop. Seems OK to me. The question is more why the backend does not respond?!
@kelson42 the backend server didn't respond for a particular URL (and that can happen due to multiple reasons). But the thing is the scraper didn't even try to move forward. This happened in a step where the scraper checked whether the resource exists or not. Also, the URL that it was trying to access didn't even exist, which made me check the URLs generation part and I found out that if the filtered URL list (from the sync data) was empty, it sent all the combinations generated to try. Maybe it was being used for development (I didn't touch this when I refactored). This further makes this resourse existence checking step unnecessary unless in a development scenario as we always have consistent data from the rsync step. I have commented those out.
The last gutenberg run (https://farm.openzim.org/pipeline/5ef4da41443d22424a730e05/debug) failed possibly due to a connection error while downloading a book. The scraper shouldn't crash on such errors but should definitely output the error.
This is the last log from the scraper explaining the message -