meggart / DiskArrays.jl

Other
73 stars 13 forks source link

fix zip #105

Closed rafaqz closed 1 year ago

rafaqz commented 1 year ago

This PR fixes zip by adding a RechunkedDiskArray that we can use to force a specific access pattern on a AbstracDiskArray or a regular AbstractArray.

In zip we then force all the arrays to have the same pattern as the first disk array we find that is Chunked.

This will be a bit slower when the chunks don't match, but at least it is correct. We could later calculate some optimal chunk pattern for all arrays passed to zip given the available memory.

This fixes some outstanding @test_broken tests, because == uses zip underneath.

Closes #103

rafaqz commented 1 year ago

I updated this to collect in the right order using a DiskZip wrapper.

Currently it will only work with other AbstractArray iterators the same size as the disk array.

rafaqz commented 1 year ago

Thanks for mentioning performance, I didnt look into it much I was focussed on correctness after some of the julia correctness discussions that are happening.

Maybe we can manually zip all iterators at the chunk level to make it faster?

Then we could even buffer non array iterators to the chunk size so they also work. (No actually thats probably too memory intensive)

meggart commented 1 year ago

Let me know when you are finished with your changes, I can look regarding readblock! and writeblock!, I will add some performance checks then and test the getindex_counts etc.

rafaqz commented 1 year ago

This may be fine to merge now as chunking was actually working already, but there may be some other optimisations that could be done if you want to make any changes

meggart commented 1 year ago

Yes, I had not realized this was already working. Thanks a lot for adding the test. Is this merge conflict critical?