rdfhdt / hdt-java

HDT Java library and tools.
Other
94 stars 69 forks source link

Async CatTree and dir parser #186

Closed ate47 closed 1 year ago

ate47 commented 1 year ago

This PR add 2 async versions for the DirParser and the CatTree

The CatTree async version can be selected with the loader.cattree.async=[boolean] option, knowing a cat is most likely faster than the time before the next cat, you can't select more than 2 threads.

The DirParser async version can be selected with the parser.dir.async=[number] option, 1 = sync, 0 = processor count, n = using n threads.

I've also added a commit to use large arrays in Bitmap375Disk to use larger indexes.

EDIT: For example, when loading 200 different NT files (same size), it take 4 times less time to load them with the dir parser:

async parsing: 295 ms 526 us
sync parsing: 1 sec 178 ms 398 us