skrapeit / skrape.it

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
https://docs.skrape.it
MIT License
789 stars 57 forks source link

[BUG] mode = Mode.DOM seems to be not working anymore #220

Closed dewijones92 closed 1 year ago

dewijones92 commented 1 year ago

Describe the bug https://docs.skrape.it/docs/dsl/extract-client-side-rendered-data this says how to run the client side javascript

Code Sample

   val scrapedData = skrape {
        url = "http://some.url"
        mode = Mode.DOM // <--- here's the magic
        extract { 
            div {
                withClass = "dynamic"
                findFirst { text }
            }
        }
    }
    println(scrapedData)

Expected behavior i get this error

None of the following functions can be called with the arguments supplied.
skrape(BlockingFetcher<TypeVariable(R)>, suspend Scraper<TypeVariable(R)>.() → TypeVariable(T))   where R = TypeVariable(R), T = TypeVariable(T) for    fun <R, T> skrape(fetcher: BlockingFetcher<R>, init: suspend Scraper<R>.() → T): T defined in it.skrape.fetcher
skrape(NonBlockingFetcher<TypeVariable(R)>, suspend Scraper<TypeVariable(R)>.() → TypeVariable(T))   where R = TypeVariable(R), T = TypeVariable(T) for    suspend fun <R, T> skrape(fetcher: NonBlockingFetcher<R>, init: suspend Scraper<R>.() → T): T defined in it.skrape.fetcher

Additional context Is there a working repo where I can see this javascript example? thanks

christian-draeger commented 1 year ago

Sorry for inconvenience, the docs are outdate for this example.

This is how you can archive it with the latest version

https://github.com/skrapeit/skrape.it#scrape-a-client-side-rendered-page

eric-labelle commented 10 months ago

Hi @christian-draeger . Found this issue while trying to play with Mode.DOM also. I had already tried with BrowserFetcher and it still seems the dom isn't loaded when the fetch happens. The url I'm testing to fetch is https://www.fantrax.com/player/00454/pr6b2ivjlmccxj5j and printing the htmlDocument I can see that the whole <app-root> is empty (where all the content should be)

Any suggestion?