Closed Fukoros closed 1 year ago
I'm not sure, but it might be because your main disk is too small, for now you can't specify indexing options with the CLI. If you really need it, you need to add these options to your hdt creation script:
bitmaptriples.sequence.disk.location=indextmp
bitmaptriples.sequence.disk=true
It can be via your last HDTCat with the the argument -options "bitmaptriples.sequence.disk.location=indextmp;bitmaptriples.sequence.disk=true"
Or via the code, you import the CORE+API and you run this code:
String location = "youHdtFile.hdt";
HDTOptions spec = new HDTOptionsBase();
// indextmp is the location where you want to use the disk
spec.set("bitmaptriples.sequence.disk.location", "indextmp");
spec.set("bitmaptriples.sequence.disk", "true");
HDTManager.mapIndexedHDT(location, spec, null).close();
Edit for the devs: It is because of this line:
The object array is forced to be on disk, but the default temporary file will be in the /tmp dir, so in the main disk
Thanks for your help it solved the issue.
Hi, First of all many thanks for this great tool! We have a question concerning a JAVA error when using the HDT library to convert a large dataset to HDT, and we were wondering if you can help us with this issue. While trying to convert YAGO to HDT format using multiple divided files to reduce the usage of the RAM (the pipeline is fully described in the file Create Data.txt), in the final concatenation between the HDT files where we also ask to create the index, the HDT merged was successful while the creation of the index yielded this error :
The previous pipeline used the 3.0.5 release. So we tried to use the latest release to generate the index of this final file but it yielded the same type of issue :
We are using a server with Ubuntu 22.0.4.1 LTS with 128 GO of ram, a main disk of 16 GO and a ssd disk of 1 TO where the data is fully stored.