skrapeit / skrape.it

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
https://docs.skrape.it
MIT License
805 stars 59 forks source link

feat: #127 add ktor fetcher #137

Closed ruffCode closed 3 years ago

ruffCode commented 3 years ago

Some notes on this:

I added a suspend fetchAsync method to the implementation of KtorFetcher. Not sure if it would be better to have this in the Fetcher interface and have the fetch method call runBlocking on the suspend method.

There's quite a bit of code duplication between the HttpFetcher and KtorFetcher modules, mainly the extension functions and tests. Currently the only extension functions that are not duplicated are ones that deal with request,cookie, and response mapping. Something to think about.

I added a buildSrc module as a central place where dependencies are referenced. I only made use of it in the ktor module just in case it's not to your liking. I also updated the Kotlin plugin to 1.4.31, the dokka version is still at 1.4.30 as it hasn't caught up.

There's a typo in "basis-fetcher", I wasn't sure if it was supposed to be "basic" or "base" so I left it as is.

Let me know if you have any questions, thoughts, or critiques.

Thanks!

-Alexi

codecov[bot] commented 3 years ago

Codecov Report

Merging #137 (226a19b) into master (fa9709a) will decrease coverage by 2.25%. The diff coverage is 63.11%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #137      +/-   ##
============================================
- Coverage     84.74%   82.50%   -2.24%     
- Complexity      140      144       +4     
============================================
  Files            39       42       +3     
  Lines           950     1051     +101     
  Branches         74       95      +21     
============================================
+ Hits            805      867      +62     
- Misses          107      137      +30     
- Partials         38       47       +9     
Impacted Files Coverage Δ Complexity Δ
.../src/main/kotlin/it/skrape/fetcher/AsyncScraper.kt 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...r/src/main/kotlin/it/skrape/fetcher/KtorFetcher.kt 61.54% <61.54%> (ø) 3.00 <3.00> (?)
...er/src/main/kotlin/it/skrape/fetcher/extensions.kt 73.14% <73.14%> (ø) 0.00 <0.00> (?)
...src/main/kotlin/it/skrape/selects/CssSelectable.kt 73.53% <0.00%> (-4.24%) 28.00% <0.00%> (ø%)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fa9709a...226a19b. Read the comment docs.

christian-draeger commented 3 years ago

hey @ruffCode big thx for the PR. looks good to me. also really like the idea of having the dependency versions managed via buildSrc. i will adopt it for the other modules as well.

There's a typo in "basis-fetcher", I wasn't sure if it was supposed to be "basic" or "base" so I left it as is. 🙈 my fault, seems like i accidentally mixed up german and english there. good it hasn't been released until now. i will change it to base ^^

hope i can release it soon. i'm currently struggling a lot with maven central release or better say maven publish plugin to solve #123 and get all the stuff released since i think it is ready to be a first 1.0.0 final release now.

christian-draeger commented 3 years ago

I added a suspend fetchAsync method to the implementation of KtorFetcher. Not sure if it would be better to have this in the Fetcher interface and have the fetch method call runBlocking on the suspend method.

async support is a thing on my mind as well but since i am not really experienced with coroutines i felt like i should postpone it before i add something crappy (because of my knowledge gap on that topic 😂) to the lib. what do you think, would the only thing to make it support async operations to add fetchAsync to fetcher interface and implement it by calling the fetch method wrapped in a runBlocking block?

ruffCode commented 3 years ago

I added a suspend fetchAsync method to the implementation of KtorFetcher. Not sure if it would be better to have this in the Fetcher interface and have the fetch method call runBlocking on the suspend method.

async support is a thing on my mind as well but since i am not really experienced with coroutines i felt like i should postpone it before i add something crappy (because of my knowledge gap on that topic ) to the lib. what do you think, would the only thing to make it support async operations to add fetchAsync to fetcher interface and implement it by calling the fetch method wrapped in a runBlocking block?

As I was writing my response I realized that I didn't think about the way Scraper is implemented. The only was to expose the ktor call asynchronously - that I can think of off the top of my head - would be to create a separate AsyncScraper with suspend modifiers and an AsyncFetcher interface which KtorFetcher would implement. The scrape function would get overloaded and if KtorFetcher is passed in, the user would need to call extract / expect from a coroutine.

The only thing that would change on our end are the tests, we wouldn't be able to include KtorFetcher in any kind of enum and would have to test it separately.

Basically this :

public class ScraperAsync<R>(public val client: AsyncFetcher<R>, public val preparedRequest: R) {
    public constructor(client: AsyncFetcher<R>) : this(client, client.requestBuilder)

    @SkrapeItDsl
    public fun request(init: R.() -> Unit): ScraperAsync<R> {
        this.preparedRequest.run(init)
        return this
    }

    public suspend fun scrape(): Result =
        client.fetch(preparedRequest)
}

@SkrapeItDsl
public fun <R, T> skrape(client: AsyncFetcher<R>, init: ScraperAsync<R>.() -> T): T =
    ScraperAsync(client).init()

@SkrapeItDsl
public suspend fun ScraperAsync<*>.expect(init: Result.() -> Unit) {
    extract(init)
}

@SkrapeItDsl
public suspend fun <T> ScraperAsync<*>.extract(extractor: Result.() -> T): T =
    scrape().extractor()

@SkrapeItDsl
public suspend inline fun <reified T : Any> ScraperAsync<*>.extractIt(crossinline extractor: Result.(T) -> Unit): T {
    val instance = T::class.createInstance()
    return extract { instance.also { extractor(it) } }
}
christian-draeger commented 3 years ago

The only thing that would change on our end are the tests, we wouldn't be able to include KtorFetcher in any kind of enum and would have to test it separately.

I could totally live with that :) Would you be open to extend the PR with the AsyncScraper?

ruffCode commented 3 years ago

Absolutely, will do