nalimilan / FreqTables.jl

Frequency tables in Julia
Other
90 stars 19 forks source link

bug in freqtable on Vector{Any} #22

Open bkamins opened 6 years ago

bkamins commented 6 years ago

The following code fails:

julia> freqtable(Any[1,3])
ERROR: BoundsError: attempt to access 2-element Array{Int64,1} at index [3]
nalimilan commented 6 years ago

Yes, this is due to the ambiguity between integer names and integer indices. NamedArrays treats integers as indices. We would need a wrapper like Name to specify that we're passing a name.

@davidavdav Have you considered this issue before?

bkamins commented 6 years ago

Thanks! I agree with the reason - but I did not have time to dig into NamedArrays to propose the right solution.

davidavdav commented 6 years ago

I would expect that if you give integer names [1, 3], then the second element should be indexed by 3, and 2 should give a missing key error. But if you give it [1, 3] of type Any[], you're making it very hard for NamedArrays. You can still probably index using 3 by casting that to type Any somehow---i don't know if this is possible, since 3 simply "has" type Int, I think.

nalimilan commented 6 years ago

Ah, indeed it only happens when the element type is not Int. Anyway the current behavior is somewhat problematic: indexing a NamedArray with an integer means passing an index, unless the type of the names is Int. That doesn't really allow writing generic code. It would be more consistent to either always treat integer values as indices, or to treat all values as names. Then a Key or Index (depending on the chosen default) would allow specifying how to interpret the passed value.

bkamins commented 6 years ago

I would recommend to do something about it because it is not related to Any type only, e.g.

This is a bad bug (it passes silently and gives error):

julia> freqtable([1, missing, 3])
3-element Named Array{Int64,1}
Dim1    │
────────┼──
1       │ 1
3       │ 0
missing │ 1

but I can also force error:

julia> freqtable([missing, 3])
ERROR: BoundsError: attempt to access 2-element Array{Int64,1} at index [3]
bkamins commented 6 years ago

@nalimilan Maybe a temporary solution would be to force such class of cases to CategoricalArray before processing?

davidavdav commented 6 years ago

FYI I have a solution, I think, for this bug

indices(dict::Associative{Any,V}, i::Real) where V <: Integer = dict[i]
indices(dict::Associative{Any,V}, i)  where V <: Integer = dict[i]

but this still doesn't pass the test related to multi-dimensional sliced cartesian indexing. That requires a bit of digging.

bkamins commented 6 years ago

Thank you for looking into it!

davidavdav commented 6 years ago

The errors I am getting are related to cartesian indexing, somehow the code paths for

n[:, :, 1, 1]

and

n[:, :, CartesianIndex(1,1)]

Are completely different, the latter giving me a NamedArray with key type Any, which seems to be the cause of the trouble I am getting.