[Bug] excludePaths format isn't documented anywhere

mendableai / firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

https://firecrawl.dev

GNU Affero General Public License v3.0

17.4k stars 1.26k forks source link

[Bug] excludePaths format isn't documented anywhere #764

Closed mdagost closed 1 week ago

mdagost commented 1 week ago

Describe the Bug Provide a clear and concise description of what the bug is.

I'm having trouble using excludePaths because I can't find documentation anywhere on how it works. Is it a regex? Is it a glob pattern? What is it?

rafaelsideguide commented 1 week ago

Hey @mdagost , I just PR'd a better description for includePaths and excludePaths to our docs. This is how it works:

excludePaths (string[]): Specifies URL patterns to exclude from the crawl by comparing website paths against the provided regex patterns. For example, if you set "excludePaths": ["blog/*"] for the base URL firecrawl.dev, any results matching that pattern will be excluded, such as https://www.firecrawl.dev/blog/firecrawl-launch-week-1-recap.

nickscamara commented 1 week ago

It follows glob pattern!

mdagost commented 1 week ago

Which is it? regex or glob pattern?

mdagost commented 1 week ago

I've seen both referenced. When I try glob patterns that involve the ** pattern I get an error about a regex, which made me think that it's actually a regex.