skrapeit / skrape.it

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
https://docs.skrape.it
MIT License
789 stars 57 forks source link

[QUESTION] Getting "Connection refused: no further information" when calls skrape.it exctract methods from submodule #219

Closed ZhestovskyYS closed 1 year ago

ZhestovskyYS commented 1 year ago

Hi! I want to use skrape.it to parse kotlin.lang documentation. I've created submodule ":parse" in project and calls extract method from its' class

    class ExtractUseCase {
    suspend operator fun invoke() = skrape(AsyncFetcher) {
        var header = ""
        request { KOTLIN_DOCS }
        response {
            htmlDocument {
                header = parseHeader()
            }
        }
        return@skrape header
    }

    private fun Doc.parseHeader(): String {
        var header = ""

        h1 {
            findFirst { header = text }
        }

        return header
    }
}

So, for a test I call it from main and before it added local call:

suspend fun main() {
    extracted()
    withContext(Dispatchers.IO) {
        println(ExtractUseCase().invoke())
    }
}

private suspend fun extracted() {
    withContext(Dispatchers.IO) {
        skrape(AsyncFetcher) {
            request { url = "https://kotlinlang.org/docs/unsigned-integer-types.html" }
            response {
                htmlDocument {
                    h1 {
                        findFirst {
                            println(text)
                        }
                    }
                }
            }
        }
    }
}

And the output is:

Unsigned integer types
Exception in thread "main" java.net.ConnectException: Connection refused: no further information
        at java.base/sun.nio.ch.Net.pollConnect(Native Method)
    at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:973)
    at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174)
    at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148)
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351)
    at     org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221        )
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
at java.base/java.lang.Thread.run(Thread.java:1589)
ZhestovskyYS commented 1 year ago

Sorry, I didn't set url and userAgent in invoke method.

operator fun invoke(): String = skrape(BrowserFetcher) {
    var header = ""
    request {
        url = KOTLIN_DOCS
        userAgent = "Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"
    }
    response {
        htmlDocument {
            header = parseHeader()
        }
    }
    return@skrape header
}