internetarchive / umbra

A queue-controlled browser automation tool for improving web crawl quality
Apache License 2.0
60 stars 25 forks source link

More stable handling of started browsers #74

Open blekinge opened 5 years ago

blekinge commented 5 years ago

Reworked the starting and stopping of browsers to hopefully make the code more resistant to browser errors.

The major change is in
def browse_thread_run_then_cleanup():

where I open and close+release the resources (browser) the thread needs. By trusting in the try...finally construct, I can ensure that the browser is stopped and released when the thread completes.

nlevitt commented 5 years ago

Sorry for the delay getting to these pull requests by the way.

Can you talk about the impetus for this change? What went wrong? Do you have a stack trace? How are you using umbra?