Lock running crawlers - Githubissues

yawik / SimpleImport

Simple Job Import Module. Imports job openings into YAWIK

MIT License

0 stars 1 forks source link

Lock running crawlers #19

Closed TiSiE closed 4 years ago

TiSiE commented 5 years ago

A crawler must not be able to run if it is already running in another process.
For example if a crawler is run through a cron job and started on the terminal - hat will lead to duplicate job entities in the database.
So we need a mechanism to lock a running crawler for other processes.

A flag in the crawler entity should be enough - although that means, the entity must be flushed to the database before the crawling loop starts.

cbleek commented 4 years ago

Maybe we should use a library, which offers a locking feature?

https://github.com/php-lock/lock

kilip commented 4 years ago

@cbleek @TiSiE

The simplest way to do this is to create a lock file like var/simple-import.lck. Then we delete this files when the crawler is finish running, or throws an error.

cbleek commented 4 years ago

yes, a simple lock file is enough. @kilip you do it?

kilip commented 4 years ago

@cbleek Yes, I can work on it.