openwpm / OpenWPM

A web privacy measurement framework
https://openwpm.readthedocs.io
Other
1.33k stars 313 forks source link

Hanging when excecuting webdriver-related commands in some cases #947

Open GorgMish opened 3 years ago

GorgMish commented 3 years ago

During crawling some websites, the OpenWPM will kinda freeze when reaching a webdriver-related command, e.g. running "https://www.marriott.com" in official demo.py, the script will hang in this line of custom_command.py link_count = len(webdriver.find_elements(By.TAG_NAME, "a")). I think this could be related to the DB recording process since this issue usually happens for websites that relatively set big amounts of Cookies.

GorgMish commented 3 years ago

browser_param.extension_enabled = False will fix the issue temporarily. (normally setting that to False will raise an error so some other modifications are needed) edited: of course, this would disable the whole OpenWPM itself

vringar commented 3 years ago

There is no DB related activity happening in the link counting command however. https://github.com/mozilla/OpenWPM/blob/491262e9a9f1a9397abba47bc500f2495971bce4/custom_command.py#L42-L43 :thinking: Is this while using the modified GetCommand, so it doesn't stop the command sequence if there are errors?

vringar commented 3 years ago

Also

browser_param.extension_enabled = False will fix the issue temporarily. (normally setting that to False will raise an error so some other modifications are needed)

shouldn't affect how any of this works. This is quite curious.

vringar commented 3 years ago

However this command also doesn't do anything useful for larger crawls and maybe we should encourage removing it then. It's mostly there to show off how easily one can write their own commands.

GorgMish commented 3 years ago

Yes, there is no DB related command in the costum_command.py, but I thought maybe the process of recording in the database is going on in another thread or process and that causes the problem.

GorgMish commented 3 years ago

This issue would happen also with non-modified GetCommand, I used this scenario as the MWE so you can reproduce it. Not just for the webdriver-related command in LinkCountingCommand, this would also be the case for the custom commands I have written myself.

ogasser commented 3 years ago

I can reproduce this issue for http://www.marriott.com. The FinalizeCommand times out consistently, every time you run demo.py with GetCommand only (i.e. no LinkCounting) and only http://www.marriott.com in the target list.

Relevant log output:

browser_manager      - INFO     - BROWSER 2876282174: EXECUTING COMMAND: InitializeCommand()
browser_manager      - INFO     - BROWSER 2876282174: EXECUTING COMMAND: GetCommand(http://www.marriott.com,3)
browser_manager      - INFO     - BROWSER 2876282174: EXECUTING COMMAND: FinalizeCommand(5)
browser_manager      - INFO     - BROWSER 2876282174: Timeout while executing command, FinalizeCommand(5), killing browser manager
browser_manager      - INFO     - Finished working on CommandSequence with visit_id 5822921016699154 on browser with id 2876282174
storage_controller   - INFO     - Awaiting all tasks for visit_id 5822921016699154
sql_provider         - WARNING  - Visit with visit_id 5822921016699154 got interrupted
CommandSequence for http://www.marriott.com ran unsuccessfully
storage_controller   - INFO     - Terminating handler, because the underlying socket closed
storage_controller   - INFO     - Terminating handler, because the underlying socket closed
storage_controller   - INFO     - Received shutdown signal!

@vringar can you reproduce this?

vringar commented 3 years ago

I can reproduce it but don't have any idea, why this might be happening. I'll need to investigate this further.

vringar commented 3 years ago

Just me brainstorming:

Is it really the webdriver becoming unresponsive and if so, how could that happen? Like how could a website crash the webdriver?? It shouldn't have that power. Need to check the website console logs to see if there is anything going on there. Also get instrumentation in the web extension, so we can see if it's something there.

It's unlikely to be anything in OpenWPM, since there was no change to that part of the code base recently. Need to check for GeckoDriver updates. Also need to check if the GeckoDriver bug that boolean5 filed got fixed by now.

GorgMish commented 3 years ago

Disabling openwpm.xpi will fix this issue; since openwpm.xpi is responsible for communication between DB and browser instances, I suggest looking into those parts of the project to find the problem, i.e the instrumentation parts.

vringar commented 3 years ago

Could either of you add another webdriver action right after the last action in the GetCommand and see if it starts hanging there instead of in the Finalize Command?

I'll likely only get to work on this next Friday so any information gathered until then would reduce the delay in getting this fixed.

GorgMish commented 3 years ago

I already did; yes it will hang!

vringar commented 3 years ago

Thanks! Okay, then I'm going to see when we last updated geckodriver, see if we need to update our copy of their prefs and see if I need to file an upstream bug.

vringar commented 3 years ago

Nope, GeckoDriver hasn't been updated in forever. https://github.com/mozilla/OpenWPM/blame/master/environment.yaml#L11

vringar commented 3 years ago

Hmm, I'm very confused. Need to set up a minimal code sample just using selenium and visiting http://www.marriott.com to see if that breaks as well. But GorgMish, you also said that disabling the extension fixes this, so that must have something do with it?

GorgMish commented 3 years ago

Yes, it should have. Executing <Task pending name='Task-9' coro=<StorageController._handler() running at /home/alivm/Desktop/Intern/Proc/geographical-cookies/src/openwpm/openwpm/storage/storage_controller.py:94> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f91e92a5f70>()] created at /home/alivm/miniconda3/envs/openwpm/lib/python3.9/asyncio/base_events.py:424> created at /home/alivm/miniconda3/envs/openwpm/lib/python3.9/asyncio/streams.py:263> took 0.156 seconds This log might be kinda related to the problem.

vringar commented 3 years ago

I don't think this is related to this issue, or if it is then it's related via a common root cause. This log message only says that processing the request took longer than asyncio thinks a single async task should be running for (iirc 100ms). But the task still successfully continued, otherwise there would be an error message from _handler in the logs.

GorgMish commented 3 years ago

Just guessing, because this log would be prompted several times (between 1 till 10, it varies time to time) exactly before the timeout log.

GorgMish commented 3 years ago

@vringar It doesn't need to disable the whole openwpm.xpi; just disabling js_instrument (and accordingly callstack_instrument #557 ) will fix the problem. It is clear there is a huge amount of js_instrument recorded just for this single website, so as I already mentioned, the problem is likely related to the burst of the DB IO process.

vringar commented 3 years ago

Could you try just disabling the callstack Instrument and see if that also resolves the problem?

The callstack Instrument messes with Firefox internals, so I'd think it's more likely to cause breakage.

GorgMish commented 3 years ago

I disabled each of the instruments one by one and it just worked when I did it for js_instrument.

vringar commented 3 years ago

Thank you for doing this investigation!