Open piermorel opened 8 years ago
confirmed - already discovered this issue some time ago but forgot it. Also, reversed indices do not work, e.g.
idx_rev <- rev(1:50)
[1] 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26
[26] 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
dset[idx_rev]
[1] 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
[26] 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
The problem could be solved if a dataspace object had additional information how to map data items based on duplicate/reversed indices. This information is needed, if vector is not strictly sorted, like
is.unsorted(idx_rev, strictly = TRUE)
[1] TRUE
Thanks for the good catch!
In the examples from above, it looks as if the indices end up incorrect but the overall dimensions are fine. But that's also not generally the case, it seems. I have a dataset here with four dimensions:
> f['velocity']
DataSet 'velocity' (38399 x 36 x 61 x 3)
type: numeric
chunksize: 1 x 36 x 61 x 3
maxdim: UNLIMITED x 36 x 61 x 3
compression: H5Z_FILTER_DEFLATE
> f['velocity'] -> proxy
(please note that the issue also occurs without compression).
So the dimensions are 38399 x 36 x 61 x 3
. Now let me reduce that to 1 x 36 x 61 x 3
:
> dim(proxy[1,,,])
[1] 1 36 61 3
That works and it looks correct. But now let me try to reduce it to 1 x 36 x 61 x 1
:
> dim(proxy[1,,,1])
[1] 1 36 61 3
I did not get an error but the result is clearly incorrect. Indeed:
> all.equal(proxy[1,,,1],proxy[1,,,])
[1] TRUE
> all.equal(proxy[1,,,c(1,3)],proxy[1,,,])
[1] TRUE
> all.equal(proxy[1,,,c()],proxy[1,,,])
[1] TRUE
so the final subset is just ignored. Having R do the subsetting works as expected.
> dim(proxy[1,,,][,,,c(1,3)])
[1] 36 61 2
> dim(proxy[,,,][1,,,c(1,3)]) # very slow, naturally
[1] 36 61 2
Finally, for a few subsets I get an error that I somehow did not expect:
> dim(proxy[1,1,,])
Error in subsetDataSet(x, i, j, ..., drop = drop) :
argument is missing, with no default
while with plain R, this, too, works:
> dim(proxy[,,,][1,1,,]) # again, very slow, naturally
[1] 61 3
The documentation indicates that subsetting works the same in your class as the base R subsetting, but in some cases it is not true. Some example code:
Subsetting works similarly in simple cases:
But in cases of repeated indices the subsetting breaks down while base R subsetting works OK:
Having used low-level HDF5 before I imagine where that can problem comes from, but this use case should be checked for and corrected or a warning issued and the documentation corrected.