skrapeit / skrape.it

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
https://docs.skrape.it
MIT License
805 stars 59 forks source link

[QUESTION] Socket timeout on self signed SSL certs #162

Closed nikoinist closed 2 years ago

nikoinist commented 2 years ago

Hello! I'm building a simple android app that is going to scrape data from a specific website, and I get socket timeouts on request calls for https sites with self signed certs. I tried a few different sites that have self signed ssl certs and always the same thing happens.

I tried using the sslRelaxed option for the request function and playing around with different timeout values, but I can't make it work at all.

Could someone point me in right direction what could be a problem, and or give me some sample code how to do it in case of self singed certs?

I haven't put a sample code since it is super trivial and similar to samples in the doc., since I just found the skrape.it lib and trying to evaluate it for an app. Thank you!

christian-draeger commented 2 years ago

Hey could you provide some link to a page with a self signed cert so I can investigate?

nikoinist commented 2 years ago

Yes, I tried a few sites. This is the one I'm interested in scraping https://www.hep.hr/en I tried a few others with the ability to use http and https and always get the timeout on when using the ssl link. Thanks!

christian-draeger commented 2 years ago

i quickly tried from local jvm project (no android involved) and it works fine for me skrape-hep maybe it is related to android somehow, i will try...

nikoinist commented 2 years ago

Thank you for taking time and testing it.

nikoinist commented 2 years ago

I'm really baffled by this. When I run unit test it works fine but when I use it in the app it timeouts on https calls.

mufasa08 commented 2 years ago

I'm having the same issue. Have you figured it out? @nikoinist

nikoinist commented 2 years ago

I'm having the same issue. Have you figured it out? @nikoinist

@mufasa08 No, sorry. Think I'll use a different lib.

christian-draeger commented 2 years ago

There seems to be a problem with the current fetcher implementation and Android. A user in kotlin slack said he implemented an Fetcher based on ktor kohttp and everything worked fine. Current fetcher implementation is based on ktor apache http client.

As a work around since I will find the time to fix this you could either implement an fetcher based on an http client that works on Android or if you first and foremost just need the html parsing feature the http stuff with any http client of your choice and than pass the corresponding response body to the html parser

Sorry for inconvenience

I opened a bug #166 I will try to fix as soon as possible. If someone is willing to help would be very welcome since I am running short on time these days

I will close this one since it should be fixed by #166 and all the progress will be visible over their. Let's move related discussions to #166 as well if you see further input regarding the android bug.