skrapeit / skrape.it

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
https://docs.skrape.it
MIT License
808 stars 59 forks source link

Crash on Android api level 30 #127

Closed cbedoy closed 3 years ago

cbedoy commented 3 years ago

Describe the bug Getting crash on Android Level 30 due OkHttp version please update

Code Sample

 suspend fun extract() {
        coroutineScope {
            val extracted = skrape(HttpFetcher) {
                request {
                    url = "SUPER_FANCY_URL"
                }

                extractIt<ScrapSource> {
                    status {
                        it.httpStatusCode = code
                        it.httpStatusMessage = message
                    }
                    htmlDocument {
                        it.allParagraphs = p { findAll { eachText }}
                        it.paragraph = p { findFirst { text }}
                        it.allLinks = a { findAll { eachHref }}
                    }
                }
            }
            _source.postValue(extracted)
        }
    }

Expected behavior Should be able to work in level 30

Additional context I can create a PR with similar change following https://stackoverflow.com/questions/63917431/expected-android-api-level-21-but-was-30

christian-draeger commented 3 years ago

Hey thx for pointing this out. We are using KOhttp (which is basically a kotlin dsl wrapper for okhttp) internally.

The currently used okhttp version that is used by KOhttp is 3.14.2. According to the stackoverflow question that mentioned upgrading okhttp to 4.9.0 will fix the issue we could add an extra dependency on okhttp 4.9.0 which should than win over the transitive dependency that comes with KOhttp. since this would be a major version bump it could be (not sure if the okhttp api had breaking changes, since) that KOhttp dependency will break, if not I'm fine with it as a quick fix.

On a long term this should be fixed upstream at kohttp. I just opened an issue over there.

But since I don't want to wait for them and its more or less just synthetic sugar why we use kohttp I feel like it is a good idea to use plain okhttp instead to rely directly on an actively maintained project, having less I direction in our code base and have more freedom regarding versions and compatabilities.

Long story short, pull request would be highly appreciated :) Either just try to bump kohttp version as quick fix or refractor the HttpFetcher class to use okhttp instead of the kohttp dsl.

christian-draeger commented 3 years ago

@cbedoy would like to contribute a PR on this topic?

cbedoy commented 3 years ago

Sure :)

Xaseron commented 3 years ago

I had some issues with the HttpFetcher too. Adding the dependency implementation("com.squareup.okhttp3:okhttp:4.9.0") resolved it :-)

ruffCode commented 3 years ago

It doesn't look like KOhttp is being supported, it has been quite a while since it was updated.

I was able to fix this issue by replacing KOhttp with Ktor (Apache). All the tests are passing so if you're okay with this change, I'll submit a PR.

Thanks.

-Alexi

christian-draeger commented 3 years ago

hey @ruffCode, we could migrate away from KOhttp. I do not mourn this just liked the DSL, but on the other hand since its for internal use of the library anyway we could either change it to use okhttp directly (should be the easiest fix) or migrate to ktor apache client.

i would totally be open to migrate to ktor even if i havent used it a lot personally until now. so if you could provide a PR would be great :) i would prefer if it would be added as additional module (":fetcher:ktor-fetcher") to give users the freedom of choice

ruffCode commented 3 years ago

@christian-draeger

I think ktor is a good call, it has a really nice DSL and leaves you open for multi-platform, should you choose to go that route.

I already had it as a separate implementation of Fetcher so that's no problem. I'm wiling to take on splitting everything in to modules if have an idea of what the dependency graph would look like.

Thanks.

-Alexi

christian-draeger commented 3 years ago

@ruffCode good point regarding the multi-platform approach. there are plans to make it a mpp in the future. that was one of the reasons for the multi module refactoring. thereby it should be possible to migrate iteratively module per module without too much dependencies.

yeah ok, so ktor sounds like a promising choice here to me. PR would be much appreciated 😎👍

christian-draeger commented 3 years ago

i just pushed @Xaseron 's fix by adding current latest version of okhttp to HttpFetcher module as quick fix to master

christian-draeger commented 3 years ago

since hot fix has been pushed to master and there is already a PR including an alternative http fetcher with ktor client as base i will close the issue