Open balhoff opened 4 months ago
Most of the memory implementations are old and not really reliable for large datasets (at least 1B triples). I suggest you to only use disk implementation for this kind of workload.
To enable the disk indexing you can use these configs:
# use disk implementation
bitmaptriples.indexmethod=disk
# directory to compute the index
bitmaptriples.sequence.disk.location=disk-work-dir
# use disk locations and indexes
bitmaptriples.sequence.disk=true
bitmaptriples.sequence.disk.subindex=true
It can be done in with the -config
or -options
params
@ate47 thank you! Your suggestion worked perfectly.
Part of the endpoint? (leave empty if you don't know)
Description of the issue
I'm trying to create an index for a huge HDT file (29,773,033,292 triples). I'm doing this by trying to start
qepSearch.sh
.Excepted behavior
I expect a file
mytriples.hdt.index.v1-1
to be generated, and then be able to search for triples.Obtained behavior
After about 20 minutes, I get this output:
How to reproduce
Using JDK 17.0.2,
export JAVA_OPTIONS="-Xmx500G -XX:+UseParallelGC"
. Then:The file
mytriples.hdt
is 344 GB. I can provide somehow if it is helpful.Endpoint version
1.16.1
Do I want to contribute to fix it?
Maybe
Something else?
No response