monarch-initiative / phenopacket2prompt

GA4GH Phenopacket to LLM prompt
https://monarch-initiative.github.io/phenopacket2prompt/
MIT License
2 stars 0 forks source link

Current version fails at non-English languages #50

Closed leokim-l closed 2 weeks ago

leokim-l commented 2 weeks ago

Current output of develop (after having run download) has errors. Specifically, running batch -d ppkt_dir gives the output below. The mentioned pheonpacket which is claimed to not have no phenotypic abnormalities does have several HPO codes and the English prompt is successfully created. Commenting out Spanish makes the next language, Dutch, fail, but sooner. More details will follow soon in comments. @KyranWissink

Code output
SLF4J(W): No SLF4J providers were found. SLF4J(W): Defaulting to no-operation (NOP) logger implementation SLF4J(W): See https://www.slf4j.org/codes.html#noProviders for further details. SLF4J(W): Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier. SLF4J(W): Ignoring binding found at [jar:file:/Users/leonardo/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.23.1/log4j-slf4j-impl-2.23.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J(W): See https://www.slf4j.org/codes.html#ignoredBindings for an explanation. [INFO] Added 9 simulated individuals. [INFO] Added 18 simulated individuals. [INFO] Added 27 simulated individuals. [INFO] Added 36 simulated individuals. [INFO] Added 45 simulated individuals. [INFO] Added 54 simulated individuals. [INFO] Added 63 simulated individuals. Retrieved 5214 files. en 2332.org.monarchinitiative.phenol.base.PhenolRuntimeException: Did not recognize onset: [HpoOnsetAge]: Embryonal onset (HP:0011460) at org.monarchinitiative.phenopacket2prompt.output.impl.english.PpktIndividualEnglish.atAgeForVignette(PpktIndividualEnglish.java:291) at org.monarchinitiative.phenopacket2prompt.output.impl.english.EnglishPromptGenerator.getVignetteAtAge(EnglishPromptGenerator.java:45) at org.monarchinitiative.phenopacket2prompt.output.impl.english.EnglishPromptGenerator.createPromptWithoutHeader(EnglishPromptGenerator.java:87) at org.monarchinitiative.phenopacket2prompt.output.PromptGenerator.createPrompt(PromptGenerator.java:80) at org.monarchinitiative.phenopacket2prompt.cmd.Utility.outputPromptsEnglish(Utility.java:211) at org.monarchinitiative.phenopacket2prompt.cmd.GbtTranslateBatchCommand.call(GbtTranslateBatchCommand.java:63) at org.monarchinitiative.phenopacket2prompt.cmd.GbtTranslateBatchCommand.call(GbtTranslateBatchCommand.java:19) at picocli.CommandLine.executeUserObject(CommandLine.java:2045) at picocli.CommandLine.access$1500(CommandLine.java:148) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) at picocli.CommandLine.execute(CommandLine.java:2174) at org.monarchinitiative.phenopacket2prompt.Main.main(Main.java:28) en 5212.[ERROR] Could not process PMID_29491316_12_year_old_boy_es-prompt.txt: No phenotypic abnormalities org.monarchinitiative.phenol.base.PhenolRuntimeException: [ERROR] Could not process PMID_29491316_12_year_old_boy_es-prompt.txt: No phenotypic abnormalities at org.monarchinitiative.phenopacket2prompt.cmd.Utility.outputPromptsInternationalFromIndividualList(Utility.java:164) at org.monarchinitiative.phenopacket2prompt.cmd.Utility.outputPromptsInternational(Utility.java:179) at org.monarchinitiative.phenopacket2prompt.cmd.GbtTranslateBatchCommand.call(GbtTranslateBatchCommand.java:67) at org.monarchinitiative.phenopacket2prompt.cmd.GbtTranslateBatchCommand.call(GbtTranslateBatchCommand.java:19) at picocli.CommandLine.executeUserObject(CommandLine.java:2045) at picocli.CommandLine.access$1500(CommandLine.java:148) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) at picocli.CommandLine.execute(CommandLine.java:2174) at org.monarchinitiative.phenopacket2prompt.Main.main(Main.java:28) Process finished with exit code 1
KyranWissink commented 2 weeks ago

To add to this, it seems the "no phenotypic abnormalities" runtime exception is true in this case. For some currently unknown reason there is only one HPO term for the spanish version of PMID_29491316_12_year_old_boy_es-prompt.txt. This term is Pancreatic atrophy, which is a missing translation for spanish. This results in this case having no phenotypic abnormalities. English has some other terms, however.

leokim-l commented 2 weeks ago

Fixed by #54