Closed brenogazzola closed 5 years ago
You need to either run the update process (directly through code or via the rake command) before starting the worker processes, or specify a different webdrivers install dir for each process.
This is a duplicate of #77
If that solution won't work for your use case, or you have a suggestion for an alternate approach, let us know.
I've taken a look at #77 now that you pointed it, and the solution is to set a different file_path for each processes using the env variable ENV['TEST_ENV_NUMBER']
I'm not sure how I'd do that in a production environment, with Sidekiq though. Each worker dyno will have 20 sidekiq processes running, which all share the same env variables, so I can't rely on that exact solution.
I could try generating a random directory when the job is run, but that means that every time I'd get a different folder, and therefore I'd always have to download the driver. I'm going to take a look if it's possible for each Sidekiq to have their own custom variables, but that seems a bit hacky to me, and prone to problems.
There was a mention of locking the file during download, so wouldn't it be possible to lock when writing the final binary instead? Then when other processes tried to write and got locked they'd automatically use another directory to prevent any deadlocks, using that binary at the time, before reverting to use the original binary (or maybe just keeping using their alternate binaries)?
If you are going to use a different install dir for each process you can use the pid to generate unique directories, however is there a specific reason you can't just run Webdrivers::Chromedriver.update
before forking the processes?
No there isn't... I went through the source code but didn't understand how calling update
on the jobs would help a race condition. But now that your comment made me actually stop to think about it, I guess I could just throw it in an initializer which would force a download while Sidekiq is booting and before it had a chance to start its workers.
I'll try that. Thanks.
Putting the call to update
in an initializer breaks Heroku's build process since ENV['GOOGLE_CHROME_BIN']
is not set at that moment yet. Putting it in the configure_server
block of Sidekiq solved the problem.
Sidekiq.configure_server do |config|
Webdrivers::Chromedriver.update
end
If anyone needs to use webdrivers with sidekiq in production, this should solve the problem of every deploy causing multiple exceptions as all sidekiq processes try to download the driver at the same time.
Summary
I'm using selenium on production to generate screenshots from websites. Every time I deploy a new release on Heroku, the worker dynos are shutdown and new ones are brought online. Each dyno has 10 sidekiq processes running.
Since there is no driver cached when the dynos come online, all 10 sidekiq processes attempt to download it. This causes three different errors, depending on the timing of each job:
ChildProcess::LaunchError - Text file busy - /app/.webdrivers/chromedriver
RuntimeError - Could not decompress chromedriver_linux64.zip to get /app/.webdrivers/chromedriver
Errno::ENOENT - No such file or directory @ apply2files - /app/.webdrivers/chromedriver
Debug Info
Please provide the following information for bug reports:
Expected Behavior
No exceptions should happen.
Actual Behavior
Three different exceptions are thrown, depending on the exact timing when each process attempts to download/decompress/read the driver file.
Text file busy
RuntimeError
Errno