milankl / BitInformation.jl

Information between bits and bytes.
MIT License
30 stars 3 forks source link

How to implement boundary conditions with `masked_value` #40

Open observingClouds opened 2 years ago

observingClouds commented 2 years ago

Hi, great package!

I just realised that the set_zero_insignificant argument is always true when masked_value is supplied. I expect

using BitInformation
bitinformation(Array{Float32}([1,2,3,4]), set_zero_insignificant=false)
32-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.25162916738782287
 0.25162916738782287
 0.0
 0.0
....

be the same as

bitinformation(Array{Float32}([1,2,3,4]), set_zero_insignificant=false, masked_value=convert(Float32,NaN))
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
....

This is however not the case for me. I tested this behaviour with version 0.5.0 and 0.5.1. There is probably only the set_zero_insignificant-keyword missing in own of the function definitions, but my eyes have not yet adapted to the Julia language to pin-point this in the code🤣 .

milankl commented 2 years ago

No, the argument is correctly passed on (this happens through ;kwargs... which just takes all keyword argument and so kwargs can be passed on to functions inside functions).

julia> using BitInformation
julia> a = rand(Float32,100)
julia> bitinformation(a,set_zero_insignificant=true)
32-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 â‹®
 0.0

julia> bitinformation(a,set_zero_insignificant=false)
32-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0040017578752447125
 0.00762811148439329
 0.004276987851521523
 0.00023748155804284116
 â‹®
 0.007157098012580706
 0.00791399692709715
 0.00480941092599706
 0.002926451394794344
 0.008214283901808138
 0.005549176061803784
 0.00359966561437504
 0.017224936442887365
 0.0059782906996011615

julia> bitinformation(a,set_zero_insignificant=false,masked_value=NaN32)
32-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0038971829878380847
 0.005576694247738862
 0.005576694247738862
 0.0005831180658698092
 â‹®
 0.0059096798906598975
 0.006578377952308912
 0.003943640725332606
 0.004330712596832224
 0.006655303074143492
 0.004283055423926335
 0.002705618353765177
 0.018927358053287838
 0.006400338428602179

but as you can see the results are not exactly the same, the reason is that to avoid counting information across array bounds currently always the last element in every dimension is masked. https://github.com/milankl/BitInformation.jl/blob/5f3ebbd135e427c68048988fdc24f6f2d5cb71e9/src/bit_count.jl#L125 So as long as you have large datasets all good, but you are right, maybe this should be done differently. Quick check, if you add one more element to your example you'll actually get the same result

julia> bitinformation(Float32[1,2,3,4,5], set_zero_insignificant=false, masked_value=NaN32)
32-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.25162916738782287
 0.25162916738782287
 â‹®
milankl commented 2 years ago

Just changed the title of this issue because this is really what this is about: Counting information across array bounds is like periodic boundary conditions, if you mask the last element then its like closed boundaries...

observingClouds commented 2 years ago

Alright, thanks for the clarification!