Closed aaronspring closed 2 years ago
That test is indeed confusing. As the array A
is not sorted, every entry is independent of the next hence all those tests just check that the information is zero.
julia> using BitInformation
julia> A = rand(Float32,30,40,50);
julia> bi1 = bitinformation(A,dim=1);
julia> bi2 = bitinformation(A,dim=2);
julia> bi3 = bitinformation(A,dim=3);
julia> hcat(bi1,bi2,bi3)
32×3 Matrix{Float64}:
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
⋮
However, if you sort the array in a given dimension then you artificially introduce some information, which is highest in that dimension
julia> sort!(A,dims=1);
julia> bi1 = bitinformation(A,dim=1);
julia> bi2 = bitinformation(A,dim=2);
julia> bi3 = bitinformation(A,dim=3);
julia> hcat(bi1,bi2,bi3)
32×3 Matrix{Float64}:
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0067747 0.00508132 0.00538892
0.292094 0.182393 0.187531
0.550684 0.265361 0.271625
0.371526 0.114251 0.118072
0.237596 0.0441321 0.0441709
⋮
0.0 0.0 9.3149e-5
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.000280003
0.000749177 0.000946589 0.000850585
0.00515332 0.00430802 0.00508684
0.0233246 0.0177343 0.0185884
0.061388 0.0458432 0.0484664
bi1
will have the highest information in the exponent/mantissa bits, but sorting along 1 dimension also influences the other (with smaller information though). The information in the last mantissa bits is due to the poor sampling of rand
(see the randfloat
function in JuliaRandom/RandomNumbers.jl as an alternative).
I have data along dimensions longitude, latitude and time and somehow intuitively would run the analysis along time.
You can run the analysis along any dimension you like. You can also add the information. The first dimension is usually just the default because that's also how the data is layed out in memory/on disk. Things can change along different dimensions, depending on the resolution. Check the supplement of our paper for some examples.
is it also possible to run bitinformation
on all dimensions and does that make sense?
Yes, that's the same as running it in all dimensions separately and averaging the information. As it's an arithmetic mean you'll end up in the situation that if the information is high in one dimension but low in another that you may cut off too many bits for that high-information dimension. So what I often just went for is using longitude alone. Rule of thumb that I found in our data is information is highest in longitude/time then latitude then vertical then ensemble. But that obviously depends on the spatio-temporal resolution...
thank you
I don't quite understand the
dim
argument inbitinformation
and its implications. Can I just ignore it and use the defaultdim=1
?https://github.com/milankl/BitInformation.jl/blob/05bd9ef447fa926a85b514162b51bc0c06afa083/test/information.jl#L37-L39 seems like
dim
only matters for sorted dimensions, i.e.dim
doesnt matter on raw data.Your example plots in https://doi.org/10.24433/CO.8682392.v1 are using
dim=1
meaninglongitude
. I have data along dimensionslongitude
,latitude
andtime
and somehow intuitively would run the analysis alongtime
.