Describe the Bug
Note: I'll mentioned I've deployed via Coolify & docker-compose, so my setup might be a little wonky. That said if there's anything to check, I'd love some direction. When calling /scrape:
[2024-09-28T20:57:32.113Z]DEBUG - Fetching sitemap links from https://mendable.ai
[2024-09-28T21:01:07.403Z]WARN - You're bypassing authentication
[2024-09-28T21:01:07.403Z]WARN - You're bypassing authentication
[2024-09-28T21:01:07.525Z]DEBUG - [Crawl] Failed to get robots.txt (this is probably fine!): {"message":"Request failed with status code 404","name":"AxiosError","stack":"AxiosError: Request failed with status code 404\n at settle (/app/node_modules/.pnpm/axios@1.7.2/node_modules/axios/dist/node/axios.cjs:1983:12)\n at BrotliDecompress.handleStreamEnd (/app/node_modules/.pnpm/axios@1.7.2/node_modules/axios/dist/node/axios.cjs:3085:11)\n at BrotliDecompress.emit (node:events:531:35)\n at endReadableNT (node:internal/streams/readable:1696:12)\n at process.processTicksAndRejections (node:internal/process/task_queues:82:21)\n at Axios.request (/app/node_modules/.pnpm/axios@1.7.2/node_modules/axios/dist/node/axios.cjs:4224:41)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async WebCrawler.getRobotsTxt (/app/dist/src/scraper/WebScraper/crawler.js:120:26)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:52:21)","config":{"transitional":{"silentJSONParsing":true,"forcedJSONParsing":true,"clarifyTimeoutError":false},"adapter":["xhr","http","fetch"],"transformRequest":[null],"transformResponse":[null],"timeout":3000,"xsrfCookieName":"XSRF-TOKEN","xsrfHeaderName":"X-XSRF-TOKEN","maxContentLength":-1,"maxBodyLength":-1,"env":{},"headers":{"Accept":"application/json, text/plain, /","User-Agent":"axios/1.7.2","Accept-Encoding":"gzip, compress, deflate, br"},"method":"get","url":"https://mendable.ai/robots.txt","axios-retry":{"retries":3,"shouldResetTimeout":false,"validateResponse":null,"retryCount":0,"lastRequestTime":1727557267405}},"code":"ERR_BAD_REQUEST","status":404}
As far as I can tell it just hangs forever. That said, the requests that are getting returned seem to be succeeding:
(different job, just had the tab open, all the mendable attempts return like that, haven't tested much else.)
When calling /scrape, I get a timeout. When I try to visit api-firecrawl.x-ware.online (the domain i'm directing api traffic to) on port 3000, I do see the following simple HTML page:
SCRAPERS-JS: Hello, world! Fly.io
To Reproduce
Steps to reproduce the issue:
Deploy via coolify through the 'repo' option with 'docker-compose' as the build utility. Set the following env vars:
BLOCK_MEDIA=
BULL_AUTH_KEY=
HOST=0.0.0.0
LLAMAPARSE_API_KEY=
LOGGING_LEVEL=
LOGTAIL_KEY=
MODEL_NAME=gpt-4o
NUM_WORKERS_PER_QUEUE=
OPENAI_API_KEY=
OPENAI_BASE_URL=
PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000
PORT=3002
POSTHOG_API_KEY=
POSTHOG_HOST=
PROXY_PASSWORD=
PROXY_SERVER=
PROXY_USERNAME=
REDIS_URL=redis://redis:6379
SCRAPING_BEE_API_KEY=
SELF_HOSTED_WEBHOOK_URL=
SLACK_WEBHOOK_URL=
SUPABASE_ANON_TOKEN=[redacted]
SUPABASE_SERVICE_TOKEN=[redacted]
SUPABASE_URL=https://supabasekong.x-ware.online/
TEST_API_KEY=
USE_DB_AUTHENTICATION=false
Run the command '...'
Run the api calls, error messages, and container logs described above.
Expected Behavior
Crawl and scrape function normally.
Screenshots
If applicable, add screenshots or copies of the command line output to help explain the issue.
Environment (please complete the following information):
Describe the Bug Note: I'll mentioned I've deployed via Coolify & docker-compose, so my setup might be a little wonky. That said if there's anything to check, I'd love some direction. When calling /scrape: [2024-09-28T20:57:32.113Z]DEBUG - Fetching sitemap links from https://mendable.ai [2024-09-28T21:01:07.403Z]WARN - You're bypassing authentication [2024-09-28T21:01:07.403Z]WARN - You're bypassing authentication [2024-09-28T21:01:07.525Z]DEBUG - [Crawl] Failed to get robots.txt (this is probably fine!): {"message":"Request failed with status code 404","name":"AxiosError","stack":"AxiosError: Request failed with status code 404\n at settle (/app/node_modules/.pnpm/axios@1.7.2/node_modules/axios/dist/node/axios.cjs:1983:12)\n at BrotliDecompress.handleStreamEnd (/app/node_modules/.pnpm/axios@1.7.2/node_modules/axios/dist/node/axios.cjs:3085:11)\n at BrotliDecompress.emit (node:events:531:35)\n at endReadableNT (node:internal/streams/readable:1696:12)\n at process.processTicksAndRejections (node:internal/process/task_queues:82:21)\n at Axios.request (/app/node_modules/.pnpm/axios@1.7.2/node_modules/axios/dist/node/axios.cjs:4224:41)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async WebCrawler.getRobotsTxt (/app/dist/src/scraper/WebScraper/crawler.js:120:26)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:52:21)","config":{"transitional":{"silentJSONParsing":true,"forcedJSONParsing":true,"clarifyTimeoutError":false},"adapter":["xhr","http","fetch"],"transformRequest":[null],"transformResponse":[null],"timeout":3000,"xsrfCookieName":"XSRF-TOKEN","xsrfHeaderName":"X-XSRF-TOKEN","maxContentLength":-1,"maxBodyLength":-1,"env":{},"headers":{"Accept":"application/json, text/plain, /","User-Agent":"axios/1.7.2","Accept-Encoding":"gzip, compress, deflate, br"},"method":"get","url":"https://mendable.ai/robots.txt","axios-retry":{"retries":3,"shouldResetTimeout":false,"validateResponse":null,"retryCount":0,"lastRequestTime":1727557267405}},"code":"ERR_BAD_REQUEST","status":404}
As far as I can tell it just hangs forever. That said, the requests that are getting returned seem to be succeeding:
anon@pop-os:~$ curl -X POST http://api-firecrawl.x-ware.online:3002/v1/crawl -H 'Content-Type: application/json' -d '{ "url": "https://mendable.ai" }' {"success":true,"id":"35d7987d-e160-4a07-836f-0c776c3736ae","url":"https://api-firecrawl.x-ware.online:3002/v1/crawl/35d7987d-e160-4a07-836f-0c776c3736ae}
And I can visit the corresponding job page:
{"success":true,"status":"scraping","completed":0,"total":1,"creditsUsed":1,"expiresAt":"2024-09-29T21:01:07.000Z","next":"https://api-firecrawl.x-ware.online:3002/v1/crawl/9f34da99-1022-490b-988b-65c4f2d9c8d2?skip=0","data":[]}
(different job, just had the tab open, all the mendable attempts return like that, haven't tested much else.)
When calling /scrape, I get a timeout. When I try to visit api-firecrawl.x-ware.online (the domain i'm directing api traffic to) on port 3000, I do see the following simple HTML page: SCRAPERS-JS: Hello, world! Fly.io
To Reproduce Steps to reproduce the issue: Deploy via coolify through the 'repo' option with 'docker-compose' as the build utility. Set the following env vars: BLOCK_MEDIA= BULL_AUTH_KEY= HOST=0.0.0.0 LLAMAPARSE_API_KEY= LOGGING_LEVEL= LOGTAIL_KEY= MODEL_NAME=gpt-4o NUM_WORKERS_PER_QUEUE= OPENAI_API_KEY= OPENAI_BASE_URL= PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000 PORT=3002 POSTHOG_API_KEY= POSTHOG_HOST= PROXY_PASSWORD= PROXY_SERVER= PROXY_USERNAME= REDIS_URL=redis://redis:6379 SCRAPING_BEE_API_KEY= SELF_HOSTED_WEBHOOK_URL= SLACK_WEBHOOK_URL= SUPABASE_ANON_TOKEN=[redacted] SUPABASE_SERVICE_TOKEN=[redacted] SUPABASE_URL=https://supabasekong.x-ware.online/ TEST_API_KEY= USE_DB_AUTHENTICATION=false
Run the api calls, error messages, and container logs described above.
Expected Behavior Crawl and scrape function normally.
Screenshots If applicable, add screenshots or copies of the command line output to help explain the issue.
Environment (please complete the following information):
Logs Logs found above.
Additional Context Networking handled by traefik via coolify