Autosave triggered by single thread and not global.

rivermont commented 6 years ago

Checklist

[x] Same issue has not been opened before.

Expected Behavior

All threads to stop as crawler prints info and saves files.

Actual Behavior

Once one thread reaches SAVE_COUNT links crawled, it saves while the other threads continue. This results in [CRAWL] logs in between [INFO] logs.
It seems like this is inefficient and could result in some saving errors.

Steps to Reproduce the Problem

Run crawler
Wait for the autosave cap to be hit.

Specifications

Crawler Version: 1.6.2
Platform: Ubuntu (16.04 LTS)
Python version: 3.5.2
Dependency Versions: All latest.

Hrily commented 4 years ago

It seems like this is inefficient and could result in some saving errors.

@rivermont Can you please elaborate this?

I would like to understand all the cases where this will result in errors.

Hrily commented 4 years ago

@rivermont

I could find out few errors while auto saving and made a PR for the same.

Also, I couldn't find a way to fix logging which takes minimal change. Maybe need to revamp the logging logic so that crawling logging is paused when saving.

rivermont / spidy