Closed ebremer closed 1 year ago
Can you share the RDF file? Are you sure it is compliant ntriples? 87 million triples is very small ....
I'll have to upload it to google drive. The nt.gz file is 571MB. The file read with RDFDataMgr.loadModel without any issues.
It's because the dictionary is probably too big for the default implementation, try to use a disk based loading method with HDTOptionsKeys.DICTIONARY_TYPE_KEY = HDTOptionsKeys.DICTIONARY_TYPE_VALUE_FOUR_SECTION_BIG for example
HDTOptions spec = HDTOptions.of(
HDTOptionsKeys.DICTIONARY_TYPE_KEY, HDTOptionsKeys.DICTIONARY_TYPE_VALUE_FOUR_SECTION_BIG
);
try (HDT hdt = HDTManager.generateHDT(...., spec,...)) {
//...
}
@ate47 wins the prize! I added the new options and it worked fine. Thank you both.
Using the latest master, I tried to convert 87,969,819 ntriples file to an HDT file. It threw the following: ''' Exception in thread "main" java.lang.OutOfMemoryError: Required array length 2147483639 + 11 is too large at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649) at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642) at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100) at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130) at org.rdfhdt.hdt.util.string.ByteStringUtil.append(ByteStringUtil.java:340) at org.rdfhdt.hdt.dictionary.impl.section.PFCDictionarySection.load(PFCDictionarySection.java:123) at org.rdfhdt.hdt.dictionary.impl.section.PFCDictionarySection.load(PFCDictionarySection.java:87) at org.rdfhdt.hdt.dictionary.impl.FourSectionDictionary.load(FourSectionDictionary.java:83) at org.rdfhdt.hdt.hdt.impl.HDTImpl.loadFromModifiableHDT(HDTImpl.java:492) at org.rdfhdt.hdt.hdt.HDTManagerImpl.doGenerateHDT(HDTManagerImpl.java:108) at org.rdfhdt.hdt.hdt.HDTManager.generateHDT(HDTManager.java:276) at com.ebremer.hdt.NewClass1.main(NewClass1.java:53) '''