privacy-tech-lab / privacy-pioneer-web-crawler

Web crawler for detecting websites' data collection and sharing practices at scale using Privacy Pioneer
https://privacytechlab.org/
MIT License
0 stars 0 forks source link

Update crawler set up instructions in readme #4

Closed jjeancharles closed 7 months ago

jjeancharles commented 8 months ago

As we discussed in today's meeting, @danielgoldelman still needs to update the readme on how to set up the crawler. @JoeChampeau and I will review these instructions and update/add to the set up instructions when necessary.

JoeChampeau commented 8 months ago

Alright, I was able to get the crawler working. Not sure what @jjeancharles and @danielgoldelman had to do, but for me, I also had to:

  1. Install Firefox Nightly and modify the binary in local-crawler.js to reference the Windows path. Running the crawler on regular Firefox doesn't throw any errors, but it seems to prohibit installing unsigned extensions (which I didn't realize initially). It looks like there's a commented out line that sets the binary to "firefox.Channel.NIGHTLY" which, I assume, gets the program to check the default installation path for Nightly. If we could get this to work instead of needing to set the path manually, that'd be nice.
  2. Super minor thing, but npm install needs to be run for both the 'slenium-crawler' directory and 'rest-api' directory. There are probably scripts that can be put in the parent package.json to streamline this into a single command.
  3. Regarding the MySQL setup, the user that's made needs to be assigned the authentication protocol 'mysql_native_password' and, then, it has to be granted all privileges on the table 'entries.'