Required array length 2147483639 + 11 is too large

ebremer commented 1 year ago

Using the latest master, I tried to convert 87,969,819 ntriples file to an HDT file. It threw the following: ''' Exception in thread "main" java.lang.OutOfMemoryError: Required array length 2147483639 + 11 is too large at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649) at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642) at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100) at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130) at org.rdfhdt.hdt.util.string.ByteStringUtil.append(ByteStringUtil.java:340) at org.rdfhdt.hdt.dictionary.impl.section.PFCDictionarySection.load(PFCDictionarySection.java:123) at org.rdfhdt.hdt.dictionary.impl.section.PFCDictionarySection.load(PFCDictionarySection.java:87) at org.rdfhdt.hdt.dictionary.impl.FourSectionDictionary.load(FourSectionDictionary.java:83) at org.rdfhdt.hdt.hdt.impl.HDTImpl.loadFromModifiableHDT(HDTImpl.java:492) at org.rdfhdt.hdt.hdt.HDTManagerImpl.doGenerateHDT(HDTManagerImpl.java:108) at org.rdfhdt.hdt.hdt.HDTManager.generateHDT(HDTManager.java:276) at com.ebremer.hdt.NewClass1.main(NewClass1.java:53) '''

D063520 commented 1 year ago

Can you share the RDF file? Are you sure it is compliant ntriples? 87 million triples is very small ....

ebremer commented 1 year ago

I'll have to upload it to google drive. The nt.gz file is 571MB. The file read with RDFDataMgr.loadModel without any issues.

ate47 commented 1 year ago

It's because the dictionary is probably too big for the default implementation, try to use a disk based loading method with HDTOptionsKeys.DICTIONARY_TYPE_KEY = HDTOptionsKeys.DICTIONARY_TYPE_VALUE_FOUR_SECTION_BIG for example

HDTOptions spec = HDTOptions.of(
    HDTOptionsKeys.DICTIONARY_TYPE_KEY, HDTOptionsKeys.DICTIONARY_TYPE_VALUE_FOUR_SECTION_BIG
);
try (HDT hdt = HDTManager.generateHDT(...., spec,...)) {
   //...
}

ebremer commented 1 year ago

@ate47 wins the prize! I added the new options and it worked fine. Thank you both.

rdfhdt / hdt-java

Required array length 2147483639 + 11 is too large #192