ontodev / robot

ROBOT is an OBO Tool
http://robot.obolibrary.org
BSD 3-Clause "New" or "Revised" License
260 stars 74 forks source link

Converting go-lego.owl to JSON throwing UTF16 String size OutOfMemoryError #670

Closed dustine32 closed 4 years ago

dustine32 commented 4 years ago

Hi!

I'm trying to use robot (release v1.6.0) to convert go-lego.owl (1.6G file) to go-lego.json but it fails about 5min in with this:

$ ROBOT_JAVA_ARGS=-Xmx128G robot convert -i go-lego.owl -o go-lego.json
Exception in thread "main" java.lang.OutOfMemoryError: UTF16 String size is 1349785996, should be less than 1073741823
    at java.base/java.lang.StringUTF16.newBytesFor(StringUTF16.java:46)
    at java.base/java.lang.AbstractStringBuilder.inflate(AbstractStringBuilder.java:202)
    at java.base/java.lang.AbstractStringBuilder.appendChars(AbstractStringBuilder.java:1654)
    at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:661)
    at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
    at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:399)
    at com.fasterxml.jackson.core.io.SegmentedStringWriter.getAndClear(SegmentedStringWriter.java:83)
    at com.fasterxml.jackson.databind.ObjectWriter.writeValueAsString(ObjectWriter.java:1037)
    at org.geneontology.obographs.io.OgJsonGenerator.prettyJsonString(OgJsonGenerator.java:18)
    at org.geneontology.obographs.io.OgJsonGenerator.render(OgJsonGenerator.java:11)
    at org.obolibrary.robot.IOHelper.saveOntologyFile(IOHelper.java:1376)
    at org.obolibrary.robot.IOHelper.saveOntology(IOHelper.java:716)
    at org.obolibrary.robot.CommandLineHelper.maybeSaveOutput(CommandLineHelper.java:669)
    at org.obolibrary.robot.ConvertCommand.execute(ConvertCommand.java:141)
    at org.obolibrary.robot.CommandManager.executeCommand(CommandManager.java:248)
    at org.obolibrary.robot.CommandManager.execute(CommandManager.java:192)
    at org.obolibrary.robot.CommandManager.main(CommandManager.java:139)
    at org.obolibrary.robot.CommandLineInterface.main(CommandLineInterface.java:58)

Could this be due to a limit on the input string size? @kltm pointed to https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringUTF16.java#L50

Thanks!

matentzn commented 4 years ago

Uhhh looks scary. Maybe also make a ticket here: https://github.com/geneontology/obographs

Since that is the framework that actually causes the problem.. maybe @julesjacobsen has an idea..

julesjacobsen commented 4 years ago

The method OgJsonGenerator.prettyJsonString will try and write the whole thing as a single String object. You might want to try streaming the output, e.g.

GraphDocument graphDocument = the input ontology
writer.writeValue(Files.newOutputStream(Paths.get("lego.json")), graphDocument);
dustine32 commented 4 years ago

@julesjacobsen Thanks for the quick response! Is this change something to be made in robot or should I make a ticket in obographs?

julesjacobsen commented 4 years ago

Ideally both - obographs should have a method accepting a buffered writer and robot ought to use it. However robot could implement it all internally as obographs is simply providing a very thin convenience wrapper around Jackson.

jamesaoverton commented 4 years ago

My preference is for obographs to implement a buffered solution and ROBOT will use it. But if somebody wants to write a PR for ROBOT that solves the problem directly, that should be fine.

dustine32 commented 4 years ago

Since I just needed the go-lego.json file for dev, I quickly hacked in @julesjacobsen's suggestion and generated the file along with PRing my change. Feel free to merge or trash as you wish as I'm currently happy. Thanks!

dustine32 commented 4 years ago

Hey @jamesaoverton , do you know when this change is scheduled to make it into a ROBOT release?

jamesaoverton commented 4 years ago

That fix has been merged into master, and you can download a JAR for the latest master here:

https://build.obolibrary.io/job/ontodev/job/robot/job/master/lastSuccessfulBuild/artifact/bin/robot.jar

It will be included in the next release, v1.7.0. I hope to make that release soon, but we don't have a fixed schedule. It mostly depends on feedback about the new export command in this email thread.

dustine32 commented 4 years ago

Great thanks @jamesaoverton !