pemistahl / lingua

The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Apache License 2.0
689 stars 61 forks source link

java.lang.ExceptionInInitializerError - META-INF? #167

Closed mnavasloro closed 1 year ago

mnavasloro commented 1 year ago

We are following the instructions in "10.1 How to add new languages?" section in the readme in order to add the Galician language. Nevertheless, we got stuck in step 5. We correctly included Galician in all the required files and built the jar file with dependencies. Nevertheless, when trying to call LanguageModelFilesWriter.createAndWriteLanguageModelFiles with the following script:

@file:DependsOn("PATH/lingua-with-dependencies.jar")

import com.github.pemistahl.lingua.api.io.*
import com.github.pemistahl.lingua.api.*
import java.nio.file.Paths

LanguageModelFilesWriter.createAndWriteLanguageModelFiles(
    Paths.get("PATH/lingua/src/main/kotlin/com/github/pemistahl/lingua/api/io/gallego/Gallego.txt"),
    Charsets.UTF_8,Paths.get("PATH/lingua/src/main/kotlin/com/github/pemistahl/lingua/api/io/gallego"),
    Language.GALICIAN,"\\p{L}"
    )

We get the following error:

kotlinc -script "PATH\lingua\galician.main.kts"
java.lang.ExceptionInInitializerError
    at shadow.kotlin.reflect.jvm.internal.impl.types.error.ErrorModuleDescriptor.<clinit>(ErrorModuleDescriptor.kt:23)
    at shadow.kotlin.reflect.jvm.internal.impl.types.error.ErrorUtils.<clinit>(ErrorUtils.kt:14)
    at shadow.kotlin.reflect.jvm.internal.impl.types.TypeUtils.<clinit>(TypeUtils.java:34)
    at shadow.kotlin.reflect.jvm.internal.impl.descriptors.impl.AbstractClassDescriptor$1.invoke(AbstractClassDescriptor.java:49)
    at shadow.kotlin.reflect.jvm.internal.impl.descriptors.impl.AbstractClassDescriptor$1.invoke(AbstractClassDescriptor.java:46)
    at shadow.kotlin.reflect.jvm.internal.impl.storage.LockBasedStorageManager$LockBasedLazyValue.invoke(LockBasedStorageManager.java:408)
    at shadow.kotlin.reflect.jvm.internal.impl.storage.LockBasedStorageManager$LockBasedNotNullLazyValue.invoke(LockBasedStorageManager.java:527)
    at shadow.kotlin.reflect.jvm.internal.impl.descriptors.impl.AbstractClassDescriptor.getDefaultType(AbstractClassDescriptor.java:175)
    at shadow.kotlin.reflect.jvm.internal.impl.builtins.jvm.JvmBuiltInsCustomizer.createMockJavaIoSerializableType(JvmBuiltInsCustomizer.kt:91)
    at shadow.kotlin.reflect.jvm.internal.impl.builtins.jvm.JvmBuiltInsCustomizer.<init>(JvmBuiltInsCustomizer.kt:59)
    at shadow.kotlin.reflect.jvm.internal.impl.builtins.jvm.JvmBuiltIns$customizer$2.invoke(JvmBuiltIns.kt:76)
    at shadow.kotlin.reflect.jvm.internal.impl.builtins.jvm.JvmBuiltIns$customizer$2.invoke(JvmBuiltIns.kt:75)
    at shadow.kotlin.reflect.jvm.internal.impl.storage.LockBasedStorageManager$LockBasedLazyValue.invoke(LockBasedStorageManager.java:408)
    at shadow.kotlin.reflect.jvm.internal.impl.storage.LockBasedStorageManager$LockBasedNotNullLazyValue.invoke(LockBasedStorageManager.java:527)
    at shadow.kotlin.reflect.jvm.internal.impl.storage.StorageKt.getValue(storage.kt:42)
    at shadow.kotlin.reflect.jvm.internal.impl.builtins.jvm.JvmBuiltIns.getCustomizer(JvmBuiltIns.kt:75)
    at shadow.kotlin.reflect.jvm.internal.impl.load.kotlin.DeserializationComponentsForJava.<init>(DeserializationComponentsForJava.kt:80)
    at shadow.kotlin.reflect.jvm.internal.impl.load.kotlin.DeserializationComponentsForJavaKt.makeDeserializationComponentsForJava(DeserializationComponentsForJava.kt:192)
    at shadow.kotlin.reflect.jvm.internal.impl.load.kotlin.DeserializationComponentsForJava$Companion.createModuleData(DeserializationComponentsForJava.kt:123)
    at shadow.kotlin.reflect.jvm.internal.impl.descriptors.runtime.components.RuntimeModuleData$Companion.create(RuntimeModuleData.kt:32)
    at shadow.kotlin.reflect.jvm.internal.ModuleByClassLoaderKt.getOrCreateModule(moduleByClassLoader.kt:58)
    at shadow.kotlin.reflect.jvm.internal.KDeclarationContainerImpl$Data$moduleData$2.invoke(KDeclarationContainerImpl.kt:36)
    at shadow.kotlin.reflect.jvm.internal.KDeclarationContainerImpl$Data$moduleData$2.invoke(KDeclarationContainerImpl.kt:35)
    at shadow.kotlin.reflect.jvm.internal.ReflectProperties$LazySoftVal.invoke(ReflectProperties.java:93)
    at shadow.kotlin.reflect.jvm.internal.ReflectProperties$Val.getValue(ReflectProperties.java:32)
    at shadow.kotlin.reflect.jvm.internal.KDeclarationContainerImpl$Data.getModuleData(KDeclarationContainerImpl.kt:35)
    at shadow.kotlin.reflect.jvm.internal.KClassImpl$Data$descriptor$2.invoke(KClassImpl.kt:50)
    at shadow.kotlin.reflect.jvm.internal.KClassImpl$Data$descriptor$2.invoke(KClassImpl.kt:48)
    at shadow.kotlin.reflect.jvm.internal.ReflectProperties$LazySoftVal.invoke(ReflectProperties.java:93)
    at shadow.kotlin.reflect.jvm.internal.ReflectProperties$Val.getValue(ReflectProperties.java:32)
    at shadow.kotlin.reflect.jvm.internal.KClassImpl$Data.getDescriptor(KClassImpl.kt:48)
    at shadow.kotlin.reflect.jvm.internal.KClassImpl.getDescriptor(KClassImpl.kt:182)
    at shadow.kotlin.reflect.jvm.internal.KClassImpl.isAbstract(KClassImpl.kt:271)
    at shadow.com.squareup.moshi.kotlin.reflect.KotlinJsonAdapterFactory.create(KotlinJsonAdapter.kt:215)
    at shadow.com.squareup.moshi.Moshi.adapter(Moshi.java:146)
    at shadow.com.squareup.moshi.Moshi.adapter(Moshi.java:106)
    at shadow.com.squareup.moshi.Moshi.adapter(Moshi.java:80)
    at com.github.pemistahl.lingua.internal.TrainingDataLanguageModelKt.<clinit>(TrainingDataLanguageModel.kt:163)
    at com.github.pemistahl.lingua.internal.TrainingDataLanguageModel.toJson(TrainingDataLanguageModel.kt:43)
    at com.github.pemistahl.lingua.api.io.LanguageModelFilesWriter.writeLanguageModel(LanguageModelFilesWriter.kt:132)
    at com.github.pemistahl.lingua.api.io.LanguageModelFilesWriter.createAndWriteLanguageModelFiles(LanguageModelFilesWriter.kt:91)
    at Galician_main.<init>(galician.main.kts:7)
Caused by: java.lang.IllegalStateException: No BuiltInsLoader implementation was found. Please ensure that the META-INF/services/ is not stripped from your application and that the Java virtual machine is not running under a security manager
    at shadow.kotlin.reflect.jvm.internal.impl.builtins.BuiltInsLoader$Companion$Instance$2.invoke(BuiltInsLoader.kt:40)
    at shadow.kotlin.reflect.jvm.internal.impl.builtins.BuiltInsLoader$Companion$Instance$2.invoke(BuiltInsLoader.kt:38)
    at shadow.kotlin.SafePublicationLazyImpl.getValue(LazyJVM.kt:107)
    at shadow.kotlin.reflect.jvm.internal.impl.builtins.BuiltInsLoader$Companion.getInstance(BuiltInsLoader.kt:38)
    at shadow.kotlin.reflect.jvm.internal.impl.builtins.KotlinBuiltIns.createBuiltInsModule(KotlinBuiltIns.java:105)
    at shadow.kotlin.reflect.jvm.internal.impl.builtins.DefaultBuiltIns.<init>(DefaultBuiltIns.kt:24)
    at shadow.kotlin.reflect.jvm.internal.impl.builtins.DefaultBuiltIns.<init>(DefaultBuiltIns.kt:21)
    at shadow.kotlin.reflect.jvm.internal.impl.builtins.DefaultBuiltIns.<clinit>(DefaultBuiltIns.kt:31)

Do you know the reason?

Thank you and best regards,

María

pemistahl commented 1 year ago

Hi, thanks for reaching out.

I think the problem here is how you try to include my library in your script. kotlinc seems to have some security restrictions for how you are trying to include my library. I don't even know whether your approach could work at all.

Have you already tried to create a simple Java application that is built by Maven or Gradle? You can then include my library as a dependency in the usual way and then run the LanguageModelFilesWriter in the main() method of the application. There is no need for you to use Kotlin in order to run LanguageModelFilesWriter.

Please try this approach. I'm pretty sure it will work.

mnavasloro commented 1 year ago

Hi Peter,

Thank you for your fast answer. I have created a Java maven project, added the jar to dependencies and then created a basic main to call LanguageModelFilesWriter as follows:

public class Main {

    /**
     * @param args the command line arguments
     */
public static void main(String[] args) {

    String path = System.getProperty("user.dir");
    //System.out.println("Path: " + path +  "\\gallego");
    LanguageModelFilesWriter.createAndWriteLanguageModelFiles(Paths.get(path + "\\gallego\\Gallego.txt"), Charsets.UTF_8,Paths.get(path + "\\gallego"),Language.GALICIAN,"\\p{L}");
    }

But I get the same error. I have tried to use the jar as standalone, as well as other Lingua functions in the very same main (such as the examples in the README) and all these worked fine, even using Language.GALICIAN (the language I added to the .kt files). Is there any example of how to call LanguageModelFilesWriter? Maybe I am using it incorrectly. I left the test project in a public repo so the error can be reproduced. As you can see, in the output folder, the file unigrams.jsongets to be created but empty.

Thank you again and best regards,

María

pemistahl commented 1 year ago

Hi María, I don't have time to run your project but why do you always try to add the file lingua-with-dependencies.jar manually to your project? This is absolutely not necessary and probably the source of your problems. As stated in my docs, simply add Lingua as a dependency like so:

<dependency>
    <groupId>com.github.pemistahl</groupId>
    <artifactId>lingua</artifactId>
    <version>1.2.2</version>
</dependency>

Maven will then download Lingua and its transitive dependencies from Maven Central automatically for you. There is no need to put the JAR file in a lib folder in your project. This standalone JAR file is only meant to be used in Java projects which do not make use of Maven or Gradle at all. There might be something wrong with the standalone JAR, I will check it as soon as I have time. But until then, simply don't use the standalone JAR and let Maven do the work for you.

Another important thing: The input text files that you want to use for the training and the output directory for the generated language models should be located outside of your Java program. Put them somewhere else on the file system. In your example project, the text file Gallego.txt cannot be opened as it cannot be found by your code. Files like this must be placed into the directory src/main/resources and then opened with something like getClass().getClassLoader().getResourceAsStream("Gallego.txt").

mnavasloro commented 1 year ago

Thank you again for the reply.

The reason why I add the .jar file because I am trying to add a new language (Galician), so I added all the required information of this language to the .kt files as mentioned in the "10.1 How to add new languages?" section in the readme, so now I can use Language.GALICIAN. Then I compiled the .jar file, but found the error when calling to LanguageModelFilesWriter.createAndWriteLanguageModelFiles.

In any case, I tested also the original .jar in the last release and the maven dependency and the error persists when calling LanguageModelFilesWriter, so it looks like some internal problem only in that part of the code, since the language detection works fine. Hope I am more clear know, probably I didn't explain myself correctly before.

pemistahl commented 1 year ago

Hah, I was totally on the wrong track. I looked closer at your stack trace and after asking Google, it looks like there is a compatibility problem between the Moshi library that performs the JSON (de)serialization and the Gradle Shadow plugin which creates the JAR file. I just don't know how to fix it yet.

In the meantime, you can try to call the LanguageModelFilesWriter directly from within the library itself. There is the file com.github.pemistahl.lingua.app.App.kt that contains a main() method. Remove the call to runApp() within it and instead call LanguageModelFilesWriter in there.

mnavasloro commented 1 year ago

Thank you again for your patience!

I have tried to do so but the same error appears, probably because I am not familiar with Kotlin and I am using Java dependencies somewhere. I will try to check the Moshi/plugin compatibility problem.

pemistahl commented 1 year ago

I've just tried and it works fine for me if I execute the code within an IDE such as IntelliJ IDEA. Just stop building the JAR with all dependencies for now.

In the file App.kt, run this example code but with existing paths, of course:

fun main() {
    LanguageModelFilesWriter.createAndWriteLanguageModelFiles(
        inputFilePath = Path("/Users/pemistahl/training.txt"),
        outputDirectoryPath = Path("/Users/pemistahl/language-models"),
        language = Language.ENGLISH
    )
}

You can also run ./gradlew runLinguaOnConsole on your command line which will execute the code in the main() method as well.

Does that work for you in IntelliJ IDEA or VSCode? What IDE or editor are you using?

pemistahl commented 1 year ago

Alright, so updating the Gradle ShadowJar plugin has fixed your problem. I've just verified it locally. Just create the JAR with dependencies again from the updated main branch and you should be fine.

mnavasloro commented 1 year ago

Great, thank you so much! It works smoothly now, I close the issue. Thank you again!