Closed cavemanloverboy closed 1 year ago
In this case, instead of using the kiddo::KdTree export, try using float::kdtree::KdTree<A, usize, K, 32, u64>?
This will switch the internal indexing type inside the tree to a 64-bit index rather than 32. The vanilla kiddo::KdTree export is equivalent to float::kdtree::KdTree<A, usize, K, 32, u32> which only uses a 32 bit index internally.
I had tried this yesterday but I got the following error:
the function or associated item `new` exists for struct `KdTree<f32, usize, 3, 32, u64>`, but its trait bounds were not satisfied
the following trait bounds were not satisfied:
`<u64 as kiddo::types::Index>::T = u64`
`u64: kiddo::types::Index`
i.e., it appears that Index
isn't implemented for u64
, only for u32
and u16
Also, 512^3 is only 2^27
Aah, I see, I thought I'd implemented Index for u64 but it seems not. Even at 2^27, it's about an order of magnitude more items than I've tried to store in it before - I'll create some test cases that use that kind of quantity and fire up the debugger to see what's going on.
I tried again with a random dataset (3D f32 with u64 content). 72.5M items worked successfully but 73M causes a LayoutError consistently.
73M continued to cause a segfault even if I doubled the bucket size, and / or set IDX
to u64
after implementing Index for u64
, or increased the BUFFER_LEN
and SCRATCH_LEN
by 10x to 3Gb. This seems to indicate to me that the issue is with the number of points themselves rather than the number of stems or leaves, since the quantity of those
gets reduced below whatever threshold is breaking this by doubling the bucket size (to 1639091 stems / 1639092 leaves on one attempt).
The KdTree itself still seemed to work at 73M, as did the serialization itself (or at least, it didn't crash).
I tried looking at the size of the serialized tree on disk. The 72.5M-item-tree was 2,139,449,440 bytes and the 73M-item-tree was 2,147,479,552 bytes.
2Gb is 2^31 bytes, which is 2,147,483,648. So, going from 72.5M to 73M almost crossed 2^31 bytes. Suspicious!
I think this suggests Rkyv is the culprit. Perhaps the relative pointers that it uses are 32 bit signed, and at the point we reach a 2Gb file, it overfows?
No matter how much more items I add to the tree, over a certain point the serialized tree is always exactly 2,147,479,552 bytes. 100M items and 73M items both produce files with 2,147,479,552 bytes. Perhaps there is an internal buffer that is indexed by an i32
somewhere and the counter just wraps around when it goes past the end or something.
Interesting, maybe check in with the rkyv folk?
Aha, this looks relevant: https://github.com/rkyv/rkyv/issues/209#issuecomment-930229372
Looks like we need to enable the size_64
feature. (https://docs.rs/rkyv/latest/rkyv/#features)
Edit: I thought that was on already? https://github.com/sdd/kiddo/blob/master/Cargo.toml#L63
yea I had it on too in my example... maybe a bug?
I think I got it... we are using file.write
not file.write_all
, lol. The buffer returned by rkyv is bigger than ≈2^32.
pub fn write(&self, buf: &[u8]) -> io::Result<usize> {
let ret = cvt(unsafe {
libc::write(
self.as_raw_fd(),
buf.as_ptr() as *const libc::c_void,
cmp::min(buf.len(), READ_LIMIT), // <--- right here
)
})?;
Ok(ret as usize)
}
With this... one run gives
it takes 46,720 millis to build tree
it takes 1,892 millis to serialize and save tree
it takes 519 millis to load file into mem
it takes 750 nanos to archived_root
46s to build tree from scratch vs ≈519 millis to load and zc deser... ≈100x speedup for my use case lol
Awesome, great spot :-) I'll update the example.
Have you tried using mmap too?
This is all you need to do once you've installed memmap
:
let mmap = unsafe { MmapOptions::new().map(&File::open("./examples/large-random-tree.rkyv")?)? };
let tree = unsafe { rkyv::archived_root::<Tree>(&mmap) };
Thats awesome. I haven't used mmaping that much, only once or twice for a npy file. How does mmaping affect query performance?
In any case, going back to the point of this issue: I think a load
and save
method with reasonable defaults would make the library very ergonomic for newcomers.
The memmap doesn't affect the query performance adversely at all. I just tried an experiment where I created a random KdTree with 250,000,000 * ([f32; 3]
+ u64
) and serialized it to an rkyv file, which was 8Gb in size. Then a separate binary that memmaps it in and performs a query. I was getting times of ~80 to 100us to map in the file and perform the query. I should test the performance after a reboot, to see what happens when the file is definitely not in the OS's file cache.
You're completely correct re the load and save methods. ArchiveKdTree::load()
and KdTree::save()
would also help point people towards using rkyv correctly (unlike my original usage 😅 ). I'll get those added.
I'l close this one too, since we got to the bottom of the problem. Thanks for the input with this one as well.
Issue
This may be a me problem or an rkyv problem, but I wanted to bring it up and discuss it here. The code below does not work when
DATA
is512 * 512 * 512
, but works with the other two sizes. May be unrelated but number of bytes in a 512^3f32
tree are suspiciously close to (and over) theu32
limit.I actually get non-deterministic errors. Sometimes its a segfault, sometimes it's a
LayoutError
. I think the most likely explanation is that I am not doing the byte alignment correctly.Suggestion
Provide a method for
rkyv
andserde
ser/de of trees that simply takes in animpl AsRef<Path>
with some reasonable serializer/deserializer parameters that takes care of all alignment issues.