mendableai / firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
https://firecrawl.dev
GNU Affero General Public License v3.0
19.24k stars 1.49k forks source link

[Question] Is there a way to extract text directly from html and convert to markdown? #787

Open kouyakamada opened 1 month ago

kouyakamada commented 1 month ago

We would like to convert html to markdown from a large collection of html using spark and web pages that require an internal network environment and authentication. Is there a way to accomplish this?

rohitt-gupta commented 1 month ago

@kouyakamada var TurndownService = require('turndown')

var turndownService = new TurndownService() var markdown = turndownService.turndown('

Hello world!

')

This is one of the easiest and most convenient way to convert HTML into markdown. image

REF - https://github.com/mixmark-io/turndown