Closed DimitriTimoz closed 2 months ago
Hi, thank you for the PR!
Enabling HTTP/2 and the two SSL features by default isn't ideal, as these are outside the scope of this crate. It appears you may be scraping content and running into memory issues on the server, which can lead to various problems. I recommend using "crawl" with subscriptions and performing the operations in smaller chunks instead.
Thank you for understanding!
Yes, I understand, but I lose the utility of the smart feature and it's more expensive. And the segfaults could happen again.
Yes, I understand, but I lose the utility of the smart feature and it's more expensive.
And the segfaults could happen again.
There's methods for smart crawling : website.crawl_smart.
There's methods for smart crawling : website.crawl_smart.
Yes, but I also need to retrieve the HTML content which is sometimes retrieved by an API and then inserted into the HTML.
There's methods for smart crawling : website.crawl_smart.
Yes, but I also need to retrieve the HTML content which is sometimes retrieved by an API and then inserted into the HTML.
Scraping doesn't change that. You still get the same html in the subscription.
Scraping doesn't change that. You still get the same html in the subscription.
okay, too bad
Maybe you should warn the other developers in the doc concerning the segfault when openssl is used.
Maybe you should warn the other developers in the doc concerning the segfault when openssl is used.
Has nothing to do with OpenSSL. You are scraping and holding onto content in a controlled memory environment. When you run out of memory or low OpenSSL near the lowest layer on the OS stack.
Use the crawl methods with subscriptions instead of the scrape calls. The scrape method and crawl methods can do everything identical.
Maybe you should warn the other developers in the doc concerning the segfault when openssl is used.
Has nothing to do with OpenSSL. You are scraping and holding onto content in a controlled memory environment. When you run out of memory or low OpenSSL near the lowest layer on the OS stack.
Use the crawl methods with subscriptions instead of the scrape calls. The scrape method and crawl methods can do everything identical.
If someone really wants to hold onto all the content in memory try adding some swap.
The subscription example uses crawl and you get the same page object back.
with OpenSSL. You are scraping and holding onto content in a controlled memory
It's not a lack of memory, I have 50 GB and my program don't use more than 1.5 GB before the segfault. And I don't understand why, since I disabled OpenSSL by using rust-tls no segfault occurs.
with OpenSSL. You are scraping and holding onto content in a controlled memory
It's not a lack of memory, I have 50 GB and my program don't use more than 1.5 GB before the sefault.
And I don't understand why, since I disabled OpenSSL by using rust-tls no segfault occurs.
Pin the custom version of request and change from OpenSSL etc from the crate can also be done.
with OpenSSL. You are scraping and holding onto content in a controlled memory
It's not a lack of memory, I have 50 GB and my program don't use more than 1.5 GB before the sefault. And I don't understand why, since I disabled OpenSSL by using rust-tls no segfault occurs.
Pin the custom version of request and change from OpenSSL etc from the crate can also be done.
Yes, that's what I did, but try doing a cargo tree -i openssl
and you will still see openssl being used by reqwest despite your feature reqwest_rustls_tls
Enabling HTTP/2 and the two SSL features by default isn't ideal,
Sorry but I forgot to say it. http2 is enabled by default and you are using it. Because I disabled default features of reqwest that is necessary to use rust-tls
because by default default-tls
is enabled.
So without that your reqwest_rustls_tls
feature doesn't work.
Enabling HTTP/2 and the two SSL features by default isn't ideal,
Sorry but I forgot to say it. http2 is enabled by default and you are using it. Because I disabled default features of reqwest that is necessary to use
rust-tls
because by defaultdefault-tls
is enabled. So without that yourreqwest_rustls_tls
feature doesn't work.
Not keeping up with the flags on reqwest, they have a handful and versions change. When you use http in the configuration or set it by default it will not work for every website using `website.with_http2_prior_knowledge(true). Not aware if that exact flag will enable that by default for everyone at the latest version etc.
When pinning the version of the deps from a crate the way spider is set building will override the crates.
Despite having the
reqwest_rustls_tls
feature enabled anddefault-features
set to false, OpenSSL was still used byreqwest
becausehyper
always uses it.To disable OpenSSL, you need to disable the default features and enable the
reqwest_rustls_tls
feature.In my case, OpenSSL was the cause of numerous segmentation faults, double frees, or corruption issues.