Open oindrila-b opened 3 months ago
@oindrila-b did you find a work around or solution for this? I'm having the same issue.
I'm also experiencing this roadblock which causes me to revert to using node for this project.
UPDATE 1: LOL, using tsx gave a different error for another library. I'm going to try to also make an issue in Crawlee, to make sure both parties know it had an error.
@oindrila-b Can you please share how to implement this workaround?
this.urls = options.urls.map(url => {
if (!url.startsWith('http://') && !url.startsWith('https://')) {
return https://${url};
}
Logger.info(url)
return url;
});
Related:
UPDATE 2: I'm currently using tsx and npm, until this bug fixed.
Oddly enough with Bun v1.1.34
I get a different error now: NS_ERROR_UNKNOWN_HOST. But the behavior is the same, Firefox browser instance started via crawlee is unable to access any HTTPS websites, however HTTP works.
Chromium instance reports net::ERR_TUNNEL_CONNECTION_FAILED
Same error bun -v 1.1.36
, to reproduce:
npx crawlee create my-crawler # <== Choose the TypeScript Example
bun run src/main.ts
The error
INFO PlaywrightCrawler: Starting the crawler.
WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Detected a session error, rotating session...
goto: net::ERR_TUNNEL_CONNECTION_FAILED at https://crawlee.dev/
Call log:
- navigating to "https://crawlee.dev/", waiting until "load"
at processTicksAndRejections (//projects/my-crawler/native:7:39)
What version of Bun is running?
1.1.22
What platform is your computer?
Linux 6.5.0-45-generic x86_64 x86_64
What steps can reproduce the bug?
Hello Bun Community,
I'm using apify/crawlee in my project to scrape some websites, and I want to do it in a
bun
environment instead ofnode
environment. The crawler I chose for my project isPlaywrightCrawler
fromcrawlee
.The script section in my
package.json
of the project looks something like this :run :
bun start:dev
What is the expected behavior?
I expect
bun
to be able to process the URLs without throwing any errors. When I run the same project innode environment
where mypackage.json
script is this :it works perfectly : Here's the result I get using npm and expect from bun as well :
This is what I want in bun as well.
What do you see instead?
When I execute the project using
bun start:dev
, even though the crawler gets initialised without any issues, when it comes to running the crawler using thecrawler.run()
method, I encounter this error:I tried to fix this by adding a piece of code that makes sure the urls have the correct protocols before they are added to the crawler for scraping , this is that code :
where
this.urls
is an empty array of strings, which later gets added to the crawler for crawling.From what I see, the default setup uses
tsx src/main.ts
to run the main file and it runs perfectly, usingbun run src/main.ts
makes it have a protocol mismatch error.Additional information
No response