Closed turban1988 closed 3 years ago
Thanks for the report! This seems to be a known issue related to #161.
Is there anything known I can do to avoid this (e.g., less borwser or more resources)?
Unfortunately we haven't investigated beyond what you see in the bugs, so I don't have advice for avoiding the issue.
I am happy to provide feedback and pointers to the relevant code if you'd like to investigate further. I suspect this is caused by a race condition in how geckodriver handles Firefox profiles. This was introduced after we moved over to geckodriver.
Hi, I would be happy if you could give me some pointers where to find the cause of the problem.
I ran a measurement with just one browser and ist still appeared.
@turban1988, hey there, as far as I see, you have several issues. May I ask you, how many urls you are crawling? And are you using a modified demo.py version or some files you configurated by yourself? Are you using classic domains or more specific ones?
However, I ran into the same problem after a scan about more than 1000 URLs. Reducing my sample set down to 500 URLs helped to reduce those issues and I thought it might have something to do with cache errors or somethingg like that. I reduced the sample set again and used around 20 random Urls like twitter, facebook, yahoo, and some others, while just using the "demo.py" set up.
As you are saying, the issue shows up only for few domains. In my last example it was the case for the following URLs: http://qq.com http://sina.com.cn http://hao123.com http://yandex.ru
because the error is appearing after a declaration of "Time Out" I changed the following setting: sleep=0 -> sleep=15; timeout=60 -> timeout=360 and I used command_sequence.dump_profile_cookies(360)
I tested it again and again and don't receive any errors anymore.
I am new in that topic and I am not sure, if this solution is suitable for you as well.
Hi @felix4webscience, I am currently using a modified openWPM version but the error also occurs when I use the demo.py.
I am not quite sure what you mean by ''calssic'' domains but I am using subsites rather than TLD+1 (e.g., I use https://www.google.com/search?q=openWPM and not https://www.google.com).
I am using a timeout of 120. I will try larger timeouts If I get the time to do so.
Hi @turban1988,
sorry, with "classic" domains I've meant homepages (TLDs). I have few ideas, why this error occurs:
However, would be interested, if expanding the timeout parameter helped you out, at least for the following error type:
/tmp/rust_mozprofile.ouql8cAxDcO7/webapps
Hi,
2) is possible but I guess unlikely in my case since I only visit subsites that are linked (i.e., there is a hyperlink to that page) on the frontpage (TLD+1) or linked on subpages.
I did set timeout=500
and the crawl did not crash due to the reported error.
Hey,
in your report above I can see different issues:
profile commands: (Warning) 1.) /tmp/rust_mozprofile.ouql8cAxDcO7/webapps NOT FOUND IN profile folder, skipping.
OSError: 1.) /tmp/rust_mozprofile.xxx/cookies.sqlite NOT FOUND IN profile folder, skipping.
IOError: 1.) /tmp/rust_mozprofile.dXVx1eQzDRJf/extension_port.txt
Browser Manager: (Error) 1.) BrowserManager - ERROR - BROWSER 7: Spawn unsuccessful | Proxy Ready: False | Profile Created: True | Profile Tar: True | Display: True | Launch Attempted: True | Browser Launched: True | Browser Ready: False
according to your answer, do you know if the amount of profile command warnings has been reduced? I have no immediate solution, but for Warnings according to "rust_mozprofile" I would search for solutions within the rust community or else for known issues with geckodriver and maybe selenium.
Probably you have to make some test runs. Like reducing your amount of URLs, switching your IP Adress, deleting your cache (just in case). Than you could use the debugger package for python https://docs.python.org/2/library/pdb.html to test, if this issue happens for certain URLs only and if you are able to reproduce this error. If you are using PyCharm, you can run the internal debugger and in case of version/ library dependencies, Pycharm detects incompatibility.
Would be great, if you can share your solution, if you got some in the end. Greetings
@felix4webscience, are you still working on this issue? I am a newbie trying to investigate and write a paper using OpenWPM, but I am not really a programmer. I have very limited knowledge of what is going on in the demo.py. I am struggling to run the file and I keep getting the selenium module not found, even though I have tried to set the path for it. The browser keeps failing to launch it says. Help will be appreciated thanks!
Hi, well, no I am just keep following the progress on OpenWPM but not actively working with it anymore. Maybe I can try to help you out. If you are new to programming I would suggest, that in any case you have technical questions, you should add your relevant sysinfo. Which OS are you running? Are you using Ubuntu vers.? Are you using Mac with docker? Which IDE are you using? And which python version? Are you using a virtual machine (e.g. Virtual box, VMware)? Those are important information for any programmer in order to replicate your issue. Hoewever, if Selenium has not been found, it seems, your library has not been correctly installed. Note, that the Import can slightly differ in case of the OS you are using. Maybe it is worthy for you to watch a tutorial about selenium, get used to the basics and run some easy projects first before starting with OpenWPM, which is quite complex? Anyhow, read the manual for selenium, it‘s very enlightening, when starting. I was a newbie as well, when I started and it took me a bunch of python tutorials to catch up🙃. But finally its a great project and worth it.
vp01020 @.***> schrieb am Mi. 28. Sept. 2022 um 00:45:
@felix4webscience https://github.com/felix4webscience, are you still working on this issue? I am a newbie trying to investigate and write a paper using OpenWPM, but I am not really a programmer. I have very limited knowledge of what is going on in the demo.py. I am struggling to run the file and I keep getting the selenium module not found, even though I have tried to set the path for it. The browse keeps failing to launch it says. Help will be appreciated thanks!
— Reply to this email directly, view it on GitHub https://github.com/openwpm/OpenWPM/issues/253#issuecomment-1260150628, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFPHRWZPSCITI3X34TSNDWLWAN2J3ANCNFSM4GUV7CPA . You are receiving this because you were mentioned.Message ID: @.***>
@vp01020 Do you want to come into the Matrix chat and we'll talk about your setup problems?
But finally its a great project and worth it.
@felix4webscience thanks for the kind words. 😊
Hi, From time to time I get the following error and the crawl crashes when the dump_profile command is executed (see below). It happens after several site visits.
openwpm.log error.txt