unclecode / crawl4ai

🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper
Apache License 2.0
2.74k stars 225 forks source link

Error While installing #41

Closed Sabakhupenia closed 2 months ago

Sabakhupenia commented 2 months ago

hey while installing this i got following error do you know why it happens. how can i solve it?

      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['C:\\Users\\99559\\AppData\\Local\\Programs\\Python\\Python311\\python.exe', '-m', 'pip', 'install', 'spacy', '--no-deps']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for crawl4ai Failed to build crawl4ai ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (crawl4ai)

arcontechnologies commented 2 months ago

Hi, Downgrade your pip version to 22.2 for instance and that should do the job. Seems that pip newer version is not compatible with the setup.py

Sabakhupenia commented 2 months ago

ok i will try

Sabakhupenia commented 2 months ago

hi now i got this error

PS C:\Users\99559\Documents\work\crawlParseurlAlternative> pip install "crawl4ai @ git+https://github.com/unclecode/crawl4ai.git" Collecting crawl4ai@ git+https://github.com/unclecode/crawl4ai.git Cloning https://github.com/unclecode/crawl4ai.git to c:\users\99559\appdata\local\temp\pip-install-hjn77nc3\crawl4ai_69a035ac485d4ca09f2dcd6ac246db9d Running command git clone --filter=blob:none --quiet https://github.com/unclecode/crawl4ai.git 'C:\Users\99559\AppData\Local\Temp\pip-install-hjn77nc3\crawl4ai_69a035ac485d4ca09f2dcd6ac246db9d' Resolved https://github.com/unclecode/crawl4ai.git to commit 3abaa82501d33626440d6ee65f83919e42bb36c4 Preparing metadata (setup.py) ... done ERROR: No .egg-info directory found in C:\Users\99559\AppData\Local\Temp\pip-pip-egg-info-qg4l0oq6

arcontechnologies commented 2 months ago

did you downgrade to pip to 22.1.2 ? python -m pip install pip==22.1.2

Sabakhupenia commented 2 months ago

did you downgrade to pip to 22.1.2 ? python -m pip install pip==22.1.2

yes

arcontechnologies commented 2 months ago

Better to install a new clean virtual environment and then reinstall 22.1.2 pip version and from there go with Craw4AI installation. that should do the job.

unclecode commented 2 months ago

@Sabakhupenia Sorry for issues you faced within the installation. In the new updat this issue has beem resolvedm I will push it soon. However at the mean time please try this and Let me know.

  1. Virtual Environment: First, create a new, clean virtual environment. This isolates your project dependencies and can help avoid conflicts.
python -m venv crawl4ai_env
source crawl4ai_env/bin/activate 
  1. Downgrade pip: (I don't believe you need this, however you can try) In the new environment, downgrade pip to version 22.1.2:
python -m pip install pip==22.1.2
  1. Modify setup.py: The current setup.py file is trying to install spacy without dependencies, which might be causing issues:
from setuptools import setup, find_packages
import os
from pathlib import Path
import subprocess
from setuptools.command.install import install

# Create the .crawl4ai folder in the user's home directory if it doesn't exist
crawl4ai_folder = os.path.join(Path.home(), ".crawl4ai")
os.makedirs(crawl4ai_folder, exist_ok=True)
os.makedirs(f"{crawl4ai_folder}/cache", exist_ok=True)

# Read the requirements from requirements.txt
with open("requirements.txt") as f:
    requirements = f.read().splitlines()

# Define the requirements for different environments
default_requirements = [req for req in requirements if not req.startswith(("torch", "transformers", "onnxruntime", "nltk", "spacy", "tokenizers", "scikit-learn", "numpy"))]
torch_requirements = [req for req in requirements if req.startswith(("torch", "nltk", "spacy", "scikit-learn", "numpy"))]
transformer_requirements = [req for req in requirements if req.startswith(("transformers", "tokenizers", "onnxruntime"))]

setup(
    name="Crawl4AI",
    version="0.2.72",
    description="🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper",
    long_description=open("README.md").read(),
    long_description_content_type="text/markdown",
    url="https://github.com/unclecode/crawl4ai",
    author="Unclecode",
    author_email="unclecode@kidocode.com",
    license="MIT",
    packages=find_packages(),
    install_requires=default_requirements,
    extras_require={
        "torch": torch_requirements,
        "transformer": transformer_requirements,
        "all": requirements,
    },
    entry_points={
        'console_scripts': [
            'crawl4ai-download-models=crawl4ai.model_loader:main',
        ],
    },
    classifiers=[
        "Development Status :: 3 - Alpha",
        "Intended Audience :: Developers",
        "License :: OSI Approved :: Apache Software License",
        "Programming Language :: Python :: 3",
        "Programming Language :: Python :: 3.7",
        "Programming Language :: Python :: 3.8",
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: 3.10",
    ],
    python_requires=">=3.7",
)
  1. Installation: Now, try to install your package. The default installation doesn't need "spacy".
pip install -e .

In case you need "spacy" and If you still encounter issues, you might need to install some dependencies manually:

pip install spacy --no-deps
python -m spacy download en_core_web_sm
pip install torch
pip install transformers

Let me know if you encounter any specific errors after trying these steps, and I'll be happy to help further.

Sabakhupenia commented 2 months ago

hey, this bug is now solved! I installed it successfully! thank you for your kind appreciation! nice!

unclecode commented 2 months ago

@Sabakhupenia You're welcome, now I also feel good, this installation was a real issue haha. have fun and happy coding