Open McDjuady opened 1 year ago
Wow this is great news 🤩 Really looking forward to this.
Let me know if there are any questions that can maybe help to understand internals of the Lib if needed.
I personally have no experience so far with creating multiplattform Libs in kotlin. In my daily life I am very much into using kotlin on the jvm/server-side. But I think skrape{it} would be very benificial in the kotlin js world 🙂
I am really glad if skrape{it} could become a multiplatform Lib 🥳
My main question right now would be if multiple fetchers are really necessary. To keep things simple I'd like to use ktor as a client for any fetch operations. However this implies, that fetching can only be done via a suspend function on non JVM platforms, since JVM is the only target to support runBlocking. This would remove HttpFetcher since it would become redundant. JsExecution is a feature of http-parser, which falls back onto BrowserFetcher on JVM, in JS it's rendered directly using JSDOM. It would make sense to move the functionality for JVM to http-praser as well, which would deprecate BrowserFetcher, leaving only AsyncFetcher, which could then be migrated into Base Fetcher to leave only one fetcher implementation. My Plan would be to create on AsyncFetcher supporting jsExecution and with JVM specific extension functions providing blocking calls if needed.
I geuss the question is if that is okay with you and if that's a direction you feel comfortable with?
Hey, sorry for late response.
My main question right now would be if multiple fetchers are really necessary.
the different fetchers (beside BrowserFetcher which has a little more special purpose) only exists because of convenience for the users to either run blocking or non-blocking.
To keep things simple I'd like to use ktor as a client for any fetch operations. However this implies, that fetching can only be done via a suspend function on non JVM platforms, since JVM is the only target to support runBlocking. This would remove HttpFetcher since it would become redundant.
if this could be simplified and as you mentioned "fetching can only be done via a suspend function on non JVM platforms" anyway i would apprechiate simplification (reducing it to 1) of the fetchers a lot as long as we provide an intuitive way to still make blocking fetches on the JVM possible. either by at least documenting that wrapping calls in a runBlocking scope is necessary or preferred provide something like a skrapeBlocking
method to the DSL.
just a quick idea somewhat inspired by what kohttp dsl is doing here, but i am open for other ideas as well: Synchronous calls: https://kohttp.gitbook.io/docs/core/synchronous-calls/get Asynchronous calls: https://kohttp.gitbook.io/docs/core/asynchronous-calls/async-get
To keep things simple I'd like to use ktor as a client for any fetch operations
i am totally fine to switch to ktor client completely as a build-in/default as long if we still provide a way for users to implement custom fetchers to keep the lib flexible and not completely dependent on ktor. i am not deep into ktor client, if it is possible to implement custom ktor clients anyway that would be good enough.
JsExecution is a feature of http-parser, which falls back onto BrowserFetcher on JVM, in JS it's rendered directly using JSDOM. It would make sense to move the functionality for JVM to http-praser as well, which would deprecate BrowserFetcher, leaving only AsyncFetcher, which could then be migrated into Base Fetcher to leave only one fetcher implementation.
That sounds resonable to me and would again simplify the code base a lot =) i like!
Last but not least the resaon why Browserfetcher is an own fetcher but also part of the html-parser
Browserfetcher
uses html-unit to act as a JVM based headless browser in order to be able to render pages that include JS correctly. it makes the actual request using html-unit as well as rendering the response in this embedded headless browser. That in turn means that if you would fetch a page that includes conditional rendering on some really browser specific things as request headers like authorization, set-cookie or user-agent etc just works as you would expect using a "real" browser.
BrowserFetcher.render()
that is currently used in the html-parser on the other hand just renders a html string without request information available.
calling WebClient.withOptions(request: Request)
extension function on the com.gargoylesoftware.htmlunit.WebClient
inside of BrowserFetchers .render()
function could probably fix align the behaviour of BrowserFetcher.render
and BrowserFetcher.renderparse
.
But since skrape{it} supports parsing html from string you don't have a request when just parsing a html string.
If that can be reworked somehow to maybe add the request options to html-parsers jsExecution mode if available that should be fine.
on he other hand i have to admit that it is only there for a super specific use-case that is maybe not worth to maintain.
from that point of view i would be fine with a simplistic solution that drops BrowserFetcher and let html-parser render html with or without js execution :) If we could still support adding request data to that rendering this would be nice but not a must. Is the described point JVM related only or also a deal with JSDOM?
I geuss the question is if that is okay with you and if that's a direction you feel comfortable with?
overall absolutely =)
Thanks for clearing all that up! I think most of what you are describing should be possible using just the Ktor client. The hardest part will probably be to get both Ktor and HtmlUnit to use the same client for requesting any additional resources that might come up when running JS on a website (I'll probably write some wrapper classes so HtmlUnit uses the ktor client). As for JSDOM it's pretty similar. With a few wrappers it'll probably be possible to have a single client to make all the requests.
It might even be possible to replace Skrape.it's own Request class with the Ktor Request builder, since they both serve the same purpose.
Could you maybe provide some tests in which the BrowserFetcher specific features are used so I can make sure it still fulfills those requirements?
Merging #209 (53ccb08) into master (956e21d) will decrease coverage by
83.18%
. The diff coverage is0.00%
.:exclamation: Current head 53ccb08 differs from pull request most recent head 90ae18d. Consider uploading reports for the commit 90ae18d to get more accurate results
@@ Coverage Diff @@
## master #209 +/- ##
==========================================
- Coverage 83.18% 0.00% -83.17%
==========================================
Files 39 40 +1
Lines 1141 1173 +32
Branches 173 145 -28
==========================================
- Hits 949 0 -949
- Misses 131 1173 +1042
+ Partials 61 0 -61
Impacted Files | Coverage Δ | |
---|---|---|
...commonMain/kotlin/it/skrape/matchers/Assertions.kt | 0.00% <ø> (ø) |
|
...in/it/skrape/matchers/ContentTypeHeaderMatchers.kt | 0.00% <0.00%> (ø) |
|
...c/commonMain/kotlin/it/skrape/matchers/Matchers.kt | 0.00% <ø> (ø) |
|
...onMain/kotlin/it/skrape/matchers/StatusMatchers.kt | 0.00% <0.00%> (ø) |
|
...c/jvmMain/kotlin/it/skrape/fetcher/AsyncFetcher.kt | 0.00% <0.00%> (ø) |
|
...src/jvmMain/kotlin/it/skrape/fetcher/extensions.kt | 0.00% <0.00%> (ø) |
|
...monMain/kotlin/it/skrape/fetcher/Authentication.kt | 0.00% <0.00%> (ø) |
|
.../src/commonMain/kotlin/it/skrape/fetcher/Cookie.kt | 0.00% <0.00%> (ø) |
|
...src/commonMain/kotlin/it/skrape/fetcher/Request.kt | 0.00% <0.00%> (ø) |
|
.../src/commonMain/kotlin/it/skrape/fetcher/Result.kt | 0.00% <ø> (ø) |
|
... and 66 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Hey, Again, sorry for the late reply. We are currently working towards a deadline at my job and this means that I unfortunately don't have as much time for open source as I would like to have 🥲
But anyway I took a week off in mid November and will try to provide some tests with the browserfetcher specific features :)
With a few wrappers it'll probably be possible to have a single client to make all the requests.
That sounds absolutely fantastic 🙂
It might even be possible to replace Skrape.it's own Request class with the Ktor Request builder, since they both serve the same purpose.
Tbh I like the look and feel of skrape{it}s request dsl more than ktors. But we could map skrape{it} request dsl to ktor request object internally
Hi!
So I've been working along on the multiplatform/JS implementation of skrape.it. It's fully functional on JS now and I'd say I'm quite close to completion. The major missing feature right now is Proxy support on JS.
Because of some of the (breaking) changes I've made this would probably be better suited for a major realease. Here is a (non inclusive) list of what I've changed since the last time:
If you have the time I'd be very happy if you could look the changes over and give me some feedback if you are happy with the direction this is going. Also there seems to be an issue with the skrape.it documentation site, where it's impossible to fetch it from JS. Cloudflare just returns 500 with an error page. You can try this yourself with postman or simply try to fetch the page in node. Maybe you could look into that since there might be some information in the server logs as to why it's happening.
Cheers Max
hey @McDjuady this is such awesome news. a really really big thank you for all the your effort so far. you rock!
first of all don't worry because of breaking changes. we will release a new major version :) if smooth migration is possible thats good, if not also ok as long as we write good docs.
i will have a look at the code at latest until monday. job is much more chilled again currently^^
PS: skrape.it docs are hosted via gitbook.com, can not see or do much over there since its SaaS. side-note: i thought about having a better documentation site. i played around with kotlin-playground) in the past. the idea was to have kind of an interactive docu like the official kotlin docs where you can try out code snippets directly. i wanted to work on that again soon, therefore i wouldn't like to invest to much time in the old docs setup / want only add documentation there.
again sorry for late response. i just reviewed what you did so far. looks good to me 👍
Any update on this?
Hey @McDjuady, I know it's been a while. But still I want to kindly ask if you would be open to finish the PR? 🙂
I've been working over the last few months to create a multiplatform version for skrape.it It's somewhat in line with #196 but I'm a bit further along in some areas, which is why i wanted to get this out. So far I've converted the buildscripts to multiplatform and implemented some modules in JS. This is still pretty much WIP and I intend to keep working on it. I'll update the pull request as I get further along and improve the code
What's done so far:
What still needs to be done:
Other notable changes: