mendableai / firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
https://firecrawl.dev
GNU Affero General Public License v3.0
19.26k stars 1.5k forks source link

[Self-Host] /map endpoint not working #908

Open lauridskern opened 1 week ago

lauridskern commented 1 week ago

Using the /map endpoint does not work. Tried with multiple different websites, getting the same error. /scrape and /crawl both work. I attached some examples of the errors I am getting.

api-1                 | 2024-11-18 12:44:50 debug [:]: Fetching sitemap links from https://www.playingcardshop.eu
api-1                 | 2024-11-18 12:44:51 debug [:]: Failed to fetch sitemap with axios from https://www.playingcardshop.eu/sitemap.xml: AxiosError: Request failed with status code 404
api-1                 | 2024-11-18 12:44:51 debug [:]: Failed to fetch sitemap from https://www.playingcardshop.eu/sitemap.xml: AxiosError: Request failed with status code 404
api-1                 | 2024-11-18 13:06:51 debug [:]: Fetching sitemap links from https://www.pcgamer.com
api-1                 | 2024-11-18 13:06:51 error [:]: Request failed for https://www.pcgamer.com/sitemap-article-types.xml: Request failed with status code 403 {}

api-1                 | 2024-11-18 13:09:37 debug [:]: Failed to fetch sitemap from https://www.spielraum.co.at/sitemap.xml: AxiosError: Request failed with status code 403
api-1                 | 2024-11-18 13:09:37 info [ScrapeURL:]: Scraping URL "https://www.spielraum.co.at/sitemap.xml"...
api-1                 | 2024-11-18 13:09:37 debug [ScrapeURL:]: Engine fire-engine;tlsclient meets feature priority threshold
api-1                 | 2024-11-18 13:09:37 info [ScrapeURL:]: Scraping via fire-engine;tlsclient...
api-1                 | 2024-11-18 13:09:37 debug [ScrapeURL:fireEngineScrape/robustFetch]: Request failed, trying 2 more times
api-1                 | 2024-11-18 13:09:37 debug [ScrapeURL:fireEngineScrape/robustFetch]: Request failed, trying 1 more times
api-1                 | 2024-11-18 13:09:37 debug [ScrapeURL:fireEngineScrape/robustFetch]: Request failed
api-1                 | 2024-11-18 13:09:37 info [ScrapeURL:]: An unexpected error happened while scraping with fire-engine;tlsclient.
api-1                 | 2024-11-18 13:09:37 warn [ScrapeURL:]: scrapeURL: All scraping engines failed! {"module":"ScrapeURL","scrapeId":"sitemap","scrapeURL":"https://www.spielraum.co.at/sitemap.xml","error":{"fallbackList":["fire-engine;tlsclient"],"results":{"fire-engine;tlsclient":{"state":"error","error":{"name":"Error","message":"Request failed","stack":"Error: Request failed\n    at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:64:23)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:56:24)\n    at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:56:24)\n    at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n    at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n    at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n    at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:174:20)\n    at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:240:12)\n    at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:114:35)\n    at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:224:24)\n    at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:23:34)\n    at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:364:36)\n    at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:153:30)\n    at async Promise.all (index 0)\n    at async mapController (/app/dist/src/controllers/v1/map.js:90:45)","cause":{"params":{"url":"undefined/scrape","logger":{},"method":"POST","body":{"url":"https://www.spielraum.co.at/sitemap.xml","engine":"tlsclient","instantReturn":true,"disableJsDom":true,"timeout":300000},"headers":{},"schema":{"_def":{"unknownKeys":"strip","catchall":{"_def":{"typeName":"ZodNever"}},"typeName":"ZodObject"},"_cached":null},"ignoreResponse":false,"ignoreFailure":false,"tryCount":1},"requestId":"b80d48a2-8884-4d54-9c09-3c1b10e35d5c","error":{"name":"TypeError","message":"Failed to parse URL from undefined/scrape","stack":"TypeError: Failed to parse URL from undefined/scrape\n    at node:internal/deps/undici/undici:13185:13\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:34:19)\n    at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:56:24)\n    at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:56:24)\n    at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n    at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n    at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n    at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:174:20)\n    at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:240:12)\n    at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:114:35)\n    at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:224:24)\n    at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:23:34)\n    at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:364:36)\n    at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:153:30)\n    at async Promise.all (index 0)","cause":{"code":"ERR_INVALID_URL","input":"undefined/scrape","name":"TypeError","message":"Invalid URL","stack":"TypeError: Invalid URL\n    at new URL (node:internal/url:806:29)\n    at new Request (node:internal/deps/undici/undici:9276:25)\n    at fetch (node:internal/deps/undici/undici:10005:25)\n    at fetch (node:internal/deps/undici/undici:13183:10)\n    at fetch (node:internal/bootstrap/web/exposed-window-or-worker:72:12)\n    at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:34:25)\n    at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:56:30)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:56:24)\n    at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n    at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n    at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n    at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:174:20)\n    at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:240:12)\n    at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:114:35)\n    at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:224:24)"}}}},"unexpected":true,"startedAt":1731935377633,"finishedAt":1731935377638}},"name":"Error","message":"All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at help@firecrawl.com.","stack":"Error: All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at help@firecrawl.com.\n    at scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:192:15)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:224:24)\n    at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:23:34)\n    at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:364:36)\n    at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:153:30)\n    at async Promise.all (index 0)\n    at async mapController (/app/dist/src/controllers/v1/map.js:90:45)"}}
api-1                 | 2024-11-18 13:09:37 error [:]: Request failed for https://www.spielraum.co.at/sitemap.xml: All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at help@firecrawl.com. {}