scrapinghub / frontera

A scalable frontier for web crawlers
BSD 3-Clause "New" or "Revised" License
1.29k stars 216 forks source link

Windows support for Frontera #282

Closed 11rohans closed 5 years ago

11rohans commented 7 years ago

Hi, I'm trying to use the distributed mode of Frontera for a broad crawl but I am running into issues testing it on my local machine in windows. I am getting an import error (cannot import error name SIGUSR1) when I start a db worker due to SIGUSR1 not being available on windows. Would love if Frontera could develop a way that this software could be used on a windows machine and circumvent this issue.

sibiryakov commented 7 years ago

Yes, Frontera wasn't tested/developed on Windows yet, but it could be. If you can, please submit a PR fixing this.

ghost commented 5 years ago

Hello, is frontera version==0.8.1 support windows 10 platform now? What is the best platform to use the frontera?

sibiryakov commented 5 years ago

Linux!

ghost commented 5 years ago

image

I am a student who trying to build a distributed crawler, and i have no experience of linux, but i willing to learn it.

Besides, im a windows user for a long time. May i wish to get your suggestion which linux distros that is suits for Frontera and easy to use.

Lastly, do you recommend me to install the linux using virtual box or alongside with my windows for development purpose?

Thanks in advance!

sibiryakov commented 5 years ago

Ubuntu is usually a default choice in such cases.

Prometheus3375 commented 4 years ago

If you want to use Frontera on Windows, follow next steps.

  1. Install Visual Studio Installer.
  2. Go to Control Panel\Programs and Features\Uninstall a program and remove Windows SDK AddOn if any.
  3. Use VS Installer to install the latest VS C++ x64/x86 build tools and Windows SDK (<MSVC v142 - VS 2019 C++ x64/x86 build tools (v14.26)> and <Windows 10 SDK (10.0.19041.0)> for now).
  4. Open PowerShell and do not close it till the end.
  5. (Optional) Initialize virtual environment and activate it according to the tutorial.
  6. Update pip.
  7. Run pip install --upgrade setuptools wheel. This will update setuptools and wheel packages.
  8. Clone python binding for CityHash for Windows.
  9. Open setup.py and change version to the latest ( 0.2.3.post9 now).
  10. Change current working directory to the root of that repository.
  11. Run python setup.py install. This will install CityHash.
  12. Now you can install Frontera.
  13. After installation, open sources of Frontera.
  14. Open frontera/worker/db.py and comment signal package import and usage (lines 8 and 129 in v0.8.1).
  15. Open frontera/worker/strategy.py and comment signal package import and usage (lines 10 and 256 in v0.8.1).

Points 1-3 resolves issues with installing Twisted package and then will be also used to build CityHash.

Points 8-11 installs python binding of CityHash for Windows. CityHash from pip does not compile on Windows (#367), and thus it cannot be installed through pip. Another way is to replace CityHash usage with some other library in frontera/contrib/backends/partitioners.py.

Points 13-15 resolves issues occurring due to usage of Unix-specific signal SIGUSR1. It is used for debugging: a user can sent it to print current stack trace. This signal cannot be used on Windows.