Open JasonWhall opened 1 year ago
typesense-docsearch-scraper
has all the commits from algolia-docsearch-scraper
up to Dec 22, 2020. I don't see any updates in the algolia scraper since then where this port limitation was addressed...
Also I still see that error message about ports not allowed in allowed_domains in the master branch of scrapy here. So this limitation still exists as of today.
So I'm surprised to see a config in the docsearch scraper configs repo with a port number!
Any update on that? I'm facing the same issue, but not understand if I'm able to test Typesense locally
Description
We currently have a site that we set up in the scraper config that is hosted on a non-standard HTTP/HTTPS port (3000). When setting the
start_urls
to a hostname with a port e.g.http://my-host:3000/
, the scraper fails with an error message suggesting it does not accept domains with ports. It looks like the old algolia scraper configs used to support ports so I assume this is related to an update to the scrapy package used in this forked solution.Steps to reproduce
http://localhost:3000
"start_urls":["http://localhost:3000/"]
Expected Behavior
Actual Behavior
Error returned from scraper:
Metadata
Typesense Version:
Docker images:
OS: Linux