mendableai / firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
https://firecrawl.dev
GNU Affero General Public License v3.0
18.53k stars 1.41k forks source link

[Self-Host] Is it possible to resolve local html files? #896

Open Taimin opened 1 day ago

Taimin commented 1 day ago

Describe the Issue I have already downloaded multiple html files. When I try to use a local firecrawl to transform these files into markdown, I got an empty dictionary in 'data'.

To Reproduce

app = FirecrawlApp(api_key="1", api_url="http://localhost:3002/")

crawl_status = app.crawl_url(
  local_html_file_path, 
  params={
    'scrapeOptions': {'formats': ['rawHtml', ]}
  }
)
print(crawl_status)

Expected Behavior The whole html file should be converted to a markdown file.

Screenshots image The data is empty.

hemeda3 commented 1 day ago

UP! same here, but iam using the cloud api, I have the files locally, dont want to upload them publicly, want to send file from local to firecrawl server from my laptop

mogery commented 11 hours ago

You can do this by hosting the file locally, with python for example: python3 -m http.server

After that, you can point your crawl at http://localhost:8000/<file path relative to the directory you started the server in>

mogery commented 11 hours ago

UP! same here, but iam using the cloud api, I have the files locally, dont want to upload them publicly, want to send file from local to firecrawl server from my laptop

We do not support this.

hemeda3 commented 2 hours ago

@mogery yes I am aware of this, just wanted to find a trick helping me avoid uplading my private files publicly, at the same time using your cloud API

You can do this by hosting the file locally, with python for example: python3 -m http.server

After that, you can point your crawl at http://localhost:8000/<file path relative to the directory you started the server in>

Thanks, this is super helpful, with NGROk could expose my file publicly but for limited time until it get parsed not very bad solution