mendableai / firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
https://firecrawl.dev
GNU Affero General Public License v3.0
19.14k stars 1.49k forks source link

[Feat] Better parse tables #782

Open rafaelsideguide opened 1 month ago

rafaelsideguide commented 1 month ago

use this URL: https://docs.datadoghq.com/api/latest/containers/

the output for the query string table looks like this:

"#### Query Strings

Name

Type

Description

filter\[tags\]

string

Comma-separated list of tags to filter containers by.

group\_by

string

Comma-separated list of tags to group containers by.

sort

string

Attribute to sort containers by.

page\[size\]

integer

Maximum number of results returned.

page\[cursor\]

string

String to query the next page of results.
This key is provided with each valid response from the API in `meta.pagination.next_cursor`."
txrp0x9 commented 4 weeks ago

I believe any fix for this would be very specific for a site? The html parser can be assigned custom rules for the example above but another site may have different ways, I do not see a general behaviour yet so a fix would be too specific, best to leave it for custom forks if someone requires, lmk if I'm wrong