[BUG] parsing long htmls tree with js execution fails

Describe the bug copied over from skrape{it} slack channel Hello! For the most part I’m loving doing idiomatic scraping with Skrape{it}! However, I’m having trouble getting the js-rendered functionality working. The example at https://docs.skrape.it/docs/dsl/extract-client-side-rendered-data looks like it’s for a previous version, since mode = Mode.DOM is no longer available. According to the docs on Github, it looks like all I should need to do is pass BrowserFetcher to the skrape function as an argument, but that doesn’t seem to do the trick. I tried setting jsExecution to true, e.g. something like this:

Code Sample

skrape(BrowserFetcher) {
    request { url = urlToScrape }
}
extract {
    htmlDocument(html = responseBody, baseUri = baseUri, jsExecution = true) {
        val t = title { findFirst { text } }
        i { "Got title:$t" }
    }
}

Expected behavior It should be possible to render big html trees

Additional context FWIW, after some investigation, the issue seems to lie in the Parser object. Specifically - the toUriScheme() method was causing a massive URL, that later when added as a Referer header had around 44kb of content - which was too large for the server to accept, hence the 400. Truncating it to 200 bytes meant there were no more errors, but unfortunately the generated dom content was incomplete, so at this point I don’t see a way of using Skrape.it for JS-based sites such as this one. If any fixes are made to make parsing this page viable with Skrape.it I will definitely try again! (bearbeitet)

skrapeit / skrape.it

[BUG] parsing long htmls tree with js execution fails #134