rdfhdt / hdt-java

HDT Java library and tools.
Other
94 stars 69 forks source link

Question about Bitmap Triples Iterator ZFOQ implementation #191

Closed KonradHoeffner closed 1 year ago

KonradHoeffner commented 1 year ago

I was trying to understand why the Java version is so much faster than our Rust implementation in tests with a ?PO query and landed on the class BitmapTriplesZFOQ, which as far as I understand it uses less operations than in the paper "Exchange and Consumption of Huge RDF Data", however a Google search for "hdt zfoq" didn't return relevant results, so I tried to go through the code but I got stuck on the next() function, so I would appreciate if someone could help me understand it or if there is some written resource point me to that. The thing I don't understand is the line long posY = adjIndex.get(posIndex);. In our own code, we use the OP-S index to get posZ (the position in the object layer), and then use a rank query to get the position in the y-layer, which takes more time, so I would be really interested on how we can skip this step and get posY directly.

@Override
public TripleID next() {
    long posY = adjIndex.get(posIndex);

    z = patZ!=0 ? patZ : adjIndex.findListIndex(posIndex)+1;
    y = patY!=0 ? patY : adjY.get(posY);
    x = adjY.findListIndex(posY)+1;

    updateOutput();

    posIndex++;

    return returnTriple;
}
ate47 commented 1 year ago

Hello,

In this iterator, the first thing made is to compute the boundaries in the calculateRange() method, it will fill the minIndex and maxIndex value, these ones:

image

So to get the posY, you simply need to sequentially read the adjIndex

ate47 commented 1 year ago

I found this schema in the slide 25 for ISWC 2017 (pdf)

KonradHoeffner commented 1 year ago

Wow, thank you for the quick and detailed reply, I will read through the slides! Maybe I built my index wrong because it points to the O-layer and not the P-layer.

ate47 commented 1 year ago

I was working on this index for #187 and I read the FOQ iterators yesterday, so it was fresh in my mind, good luck with your library ;)

KonradHoeffner commented 1 year ago

Yes, our index indeed pointed to the wrong layer. Thanks to your advice we got a 27% time reduction with PO queries excluding extraction and a 10% time reduction including extraction!