jaypyles / Scraperr

Self-hosted webscraper.
https://scraperr-docs.pages.dev/
MIT License
1.2k stars 50 forks source link

Not able to install use the proivded docker-compose.xml, would u mind to provide a bit more details on how to install? Thank you #41

Closed arkilis closed 2 weeks ago

arkilis commented 2 weeks ago

Not able to install use the proivded docker-compose.xml, would u mind to provide a bit more details on how to install? Thank you

jaypyles commented 2 weeks ago

You are going to need to leave more context than “didn’t work”

opicron commented 2 weeks ago

Chiming in, Im trying to run the docker compose on Synology. I got the errors Image not found, so I added the images to each section. That seemed to have fixed the issue here.

version: '3' # Specify the version to avoid compatibility issues

services:
  scraperr:
    image: jpyles0524/scraperr
    ports:
      - "3030:3000" # Maps port 3030 on the host to port 3000 in the container    image: jpyles0524/scraperr
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.scraperr.rule=Host(`localhost`)" # change this to your domain, if not running on localhost
      - "traefik.http.routers.scraperr.entrypoints=web" # websecure if using https
      - "traefik.http.services.scraperr.loadbalancer.server.port=3000"

  scraperr_api:
    image: jpyles0524/scraperr_api
    ports:
      - "8033:8000" # Maps port 3030 on the host to port 3000 in the container    image: jpyles0524/scraperr
    environment:
      - LOG_LEVEL=INFO
      - MONGODB_URI=mongodb://root:example@mongo:27017 # used to access MongoDB
      - SECRET_KEY=your_secret_key # used to encode authentication tokens (can be a random string)
      - ALGORITHM=HS256 # authentication encoding algorithm
      - ACCESS_TOKEN_EXPIRE_MINUTES=600 # access token expire minutes
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.scraperr_api.rule=Host(`localhost`) && PathPrefix(`/api`)" # change this to your domain, if not running on localhost
      - "traefik.http.routers.scraperr_api.entrypoints=web" # websecure if using https
      - "traefik.http.middlewares.api-stripprefix.stripprefix.prefixes=/api"
      - "traefik.http.routers.scraperr_api.middlewares=api-stripprefix"
      - "traefik.http.services.scraperr_api.loadbalancer.server.port=8000"

  mongo:
    image: mongo:latest # Uses the latest MongoDB official image
    ports:
      - "27017:27017" # Maps port 27017 on the host to port 27017 in the container (default MongoDB port)
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: example

Only now I notice that the scraperr docker is not starting. @jaypyles any idea what that could be? If I had to guess its probably that it cannot connect to the api. But I do not see any logs generated. Or maybe I messed up the ports.

arkilis commented 2 weeks ago

@opicron Hey Mate, thanks for the compose file, I tried with that, 2 out of 3 docker are not running.

image

opicron commented 2 weeks ago

Yeah with me the scraperr doesnt start. Checking what I did wrong. Edit: giving up for now. Maybe somebody can help us out :).

opicron commented 2 weeks ago

Maybe use the docker config which is in the project?

services:
  scraperr:
    image: jpyles0524/scraperr:latest
    build:
      context: .
      dockerfile: docker/frontend/Dockerfile
    container_name: scraperr
    command: ["npm", "run", "start"]
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.scraperr.rule=Host(`localhost`)" # change this to your domain, if not running on localhost
      - "traefik.http.routers.scraperr.entrypoints=web" # websecure if using https
      - "traefik.http.services.scraperr.loadbalancer.server.port=3000"
    networks:
      - web
  scraperr_api:
    init: True
    image: jpyles0524/scraperr_api:latest
    build:
      context: .
      dockerfile: docker/api/Dockerfile
    environment:
      - LOG_LEVEL=INFO
      - OLLAMA_URL=http://ollama:11434
      - OLLAMA_MODEL=phi3
      - MONGODB_URI=mongodb://root:example@webscrape-mongo:27017 # used to access MongoDB
      - SECRET_KEY=your_secret_key # used to encode authentication tokens (can be a random string)
      - ALGORITHM=HS256 # authentication encoding algorithm
      - ACCESS_TOKEN_EXPIRE_MINUTES=600 # access token expire minutes
    container_name: scraperr_api
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.scraperr_api.rule=Host(`localhost`) && PathPrefix(`/api`)" # change this to your domain, if not running on localhost
      - "traefik.http.routers.scraperr_api.entrypoints=web" # websecure if using https
      - "traefik.http.middlewares.api-stripprefix.stripprefix.prefixes=/api"
      - "traefik.http.routers.scraperr_api.middlewares=api-stripprefix"
      - "traefik.http.services.scraperr_api.loadbalancer.server.port=8000"
    networks:
      - web
  traefik:
    image: traefik:latest
    container_name: traefik
    command:
      - "--providers.docker=true"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
    ports:
      - 80:80
      - 443:443
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro"
    networks:
      - web
  mongo:
    container_name: webscrape-mongo
    image: mongo
    restart: always
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: example
    networks:
      - web
networks:
  web:

Havent tried, but with traeffik it might work. Just make sure to adjust the ports for traeffik. And you'll need a volume for mongo data.

volumes:
      - /volume1/docker/scraperr/mongo-data:/data/db # Adjust path to your Synology volume
jaypyles commented 2 weeks ago

Yes you will need a reverse proxy such as the Traefik config (which is provided in the repo's docker-compose.yaml to proxy requests to the API). Run make up to run the command that starts it up. If you are having errors please check the logs and send your errors.

opicron commented 2 weeks ago

When building I get the error: could not find /volume1/docker/scraperr/docker/api, using the above YML file

ItsNoted commented 2 weeks ago

Can we get a compose without traefik? I think a majority of us may use other proxy methods like NPM, caddy or even cloudflare.

jaypyles commented 2 weeks ago

I'll get something written up, but it should be as easy as copying the file to something like docker-compose.no-traefik.yaml and deleting the traefik labels and writing in the routes to Scraperr in your own reverse proxy config.