skrapeit / skrape.it

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
https://docs.skrape.it
MIT License
813 stars 59 forks source link

[BUG] Crash on Android when using R8 #205

Open Robotxm opened 2 years ago

Robotxm commented 2 years ago

Describe the bug When R8 is enabled, I get the exception ExceptionInInitializerError. Here is the stack trace:

java.lang.ExceptionInInitializerError
    at v7.u.b(SourceFile:3)
    at x4.f.b(Unknown Source:2)
    at it.skrape.fetcher.ScraperKt.a(SourceFile:5)
    at com.moefactory.bettermiuiexpress.repository.ExpressRepository$queryExpressDetailsFromCaiNiaoActual$2.t(SourceFile:6)
    at com.moefactory.bettermiuiexpress.repository.ExpressRepository$queryExpressDetailsFromCaiNiaoActual$2.m(SourceFile:2)
    at it.skrape.fetcher.ScraperKt$skrape$1.t(SourceFile:4)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.k(SourceFile:3)
    at v7.y.run(SourceFile:18)
    at kotlinx.coroutines.c.A(SourceFile:21)
    at v7.u.Y(SourceFile:14)
    at it.skrape.fetcher.ScraperKt.b(Unknown Source:8)
    at com.moefactory.bettermiuiexpress.repository.ExpressRepository$queryExpressDetailsFromCaiNiao$1.t(SourceFile:5)
    at com.moefactory.bettermiuiexpress.repository.ExpressRepository$queryExpressDetailsFromCaiNiao$1.m(SourceFile:2)
    at androidx.lifecycle.BlockRunner$maybeRun$1.t(SourceFile:9)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.k(SourceFile:3)
    at v7.y.run(SourceFile:18)
    at y7.e.run(SourceFile:2)
    at z7.h.run(SourceFile:1)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$a.run(SourceFile:15)
    Suppressed: kotlinx.coroutines.DiagnosticCoroutineContextException: [w0{Cancelling}@6e8eabc, Dispatchers.IO]
Caused by: org.apache.commons.logging.LogConfigurationException: java.lang.ClassNotFoundException: Didn't find class "org.apache.commons.logging.impl.LogFactoryImpl" on path: DexPathList[[zip file "/data/app/~~s6TmJS25sPj8Sk_G1Isbhg==/com.moefactory.bettermiuiexpress-cGy7sMs8jCsggIb5mjNEJA==/base.apk"],nativeLibraryDirectories=[/data/app/~~s6TmJS25sPj8Sk_G1Isbhg==/com.moefactory.bettermiuiexpress-cGy7sMs8jCsggIb5mjNEJA==/lib/arm64, /system/lib64, /system_ext/lib64]] (Caused by java.lang.ClassNotFoundException: Didn't find class "org.apache.commons.logging.impl.LogFactoryImpl" on path: DexPathList[[zip file "/data/app/~~s6TmJS25sPj8Sk_G1Isbhg==/com.moefactory.bettermiuiexpress-cGy7sMs8jCsggIb5mjNEJA==/base.apk"],nativeLibraryDirectories=[/data/app/~~s6TmJS25sPj8Sk_G1Isbhg==/com.moefactory.bettermiuiexpress-cGy7sMs8jCsggIb5mjNEJA==/lib/arm64, /system/lib64, /system_ext/lib64]])
    at n9.b.run(SourceFile:48)
    at java.security.AccessController.doPrivileged(AccessController.java:43)
    at n9.d.l(SourceFile:1)
    at n9.d.c(SourceFile:74)
    at n9.d.f(Unknown Source:0)
    at com.gargoylesoftware.htmlunit.WebClient.<clinit>(SourceFile:1)
    ... 19 more
Caused by: java.lang.ClassNotFoundException: Didn't find class "org.apache.commons.logging.impl.LogFactoryImpl" on path: DexPathList[[zip file "/data/app/~~s6TmJS25sPj8Sk_G1Isbhg==/com.moefactory.bettermiuiexpress-cGy7sMs8jCsggIb5mjNEJA==/base.apk"],nativeLibraryDirectories=[/data/app/~~s6TmJS25sPj8Sk_G1Isbhg==/com.moefactory.bettermiuiexpress-cGy7sMs8jCsggIb5mjNEJA==/lib/arm64, /system/lib64, /system_ext/lib64]]
    at dalvik.system.BaseDexClassLoader.findClass(BaseDexClassLoader.java:218)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:379)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:312)
    at n9.b.run(SourceFile:2)
    ... 24 more

It seems that some classes are renamed by R8 causing initialization failure.

Code Sample

skrape(BrowserFetcher) {
    request {
        url {
            protocol = UrlBuilder.Protocol.HTTPS
            host = "a.example.com"
            port = -1
            path = "/path/to/query"
        }
        userAgent = "Mozilla/5.0 (Linux; Android 12; M2102K1C) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Mobile Safari/537.36 EdgA/105.0.1343.48"
        sslRelaxed = true
    }

    response {
        val jDoc = Jsoup.parse(responseBody)

        // Parse using Jsoup
    }
}

Expected behavior skrape.it could run normally when using R8.

Additional context Maybe adding some proguard rules helps?

newfla commented 1 year ago

Try using these proguard-rules:

-keep class org.apache.** { *; }
-keep class ch.qos.** { *; }
-keep class io.netty.**
-keep public class org.slf4j.** {
 *
}
-keepclassmembers class org.apache.http.** { *; }
-keepclassmembers class io.ktor.** { *; }
-keepclassmembers class kotlinx.** {
    *;
    volatile <fields>;
}

I'm still facing some issues with playstore updates: updating the app always results in an app-crash and only with a fresh installation the problem goes away.

Screenshot_20221121-171143

Robotxm commented 1 year ago

Try using these proguard-rules:

-keep class org.apache.** { *; }
-keep class ch.qos.** { *; }
-keep class io.netty.**
-keep public class org.slf4j.** {
 *
}
-keepclassmembers class org.apache.http.** { *; }
-keepclassmembers class io.ktor.** { *; }
-keepclassmembers class kotlinx.** {
    *;
    volatile <fields>;
}

I'm still facing some issues with playstore updates: updating the app always results in an app-crash and only with a fresh installation the problem goes away.

Screenshot_20221121-171143

Thx a lot. I have been already using following more aggresive rules. It seemed that upgrading from an older version app worked.

-keep class com.gargoylesoftware.** { *; }
-keep class javax.xml.** { *; }
-keep class net.sourceforge.htmlunit.** { *; }
-keep class netscape.** { *; }
-keep class org.apache.** { *; }
-keep class org.htmlunit.org.apache.** { *; }
-keep class org.w3c.dom.** { *; }
-keep class org.xml.sax.** { *; }