Open scls19fr opened 6 years ago
Do these have a table structure, though? I'm just not sure...
If dimension is no more than 2, I think so. If dimension is greater, than an error should be raised. Your comment is very interesting as it shows that there is a room for a project (comparable to yours) that could deal with converting n dimensional datastructures such as NamedArray or IndexedTables currently
Interesting, interesting... I think the general pattern he might be some way to get rows out of a matrix, both a named and a normal matrix. For Query.jl integration it would be nice if these rows were tuples and named tuples respectively. But I'm not sure that is in general a good idea, I'm not sure whether very large tuples (if there are very many rows) work well...
It would have to be a special function, in any case, something like rows
. I'll have to think a bit more about that scenario.
See https://github.com/davidavdav/NamedArrays.jl/issues/55 about implementing an Iterator of NamedTuples for NamedArrays
Pinging @davidavdav
You rang?
I am a bit out of context, I've tried to read up to the references, but it is not clear to me what an iterator of named tuples would do with a NamedArray. Do you want to be able to iterate over a dimension on a NamedArray, thereby getting a tuple of the name and the array slice of one dimension lower, and do you want this to be of type Iterator for NamedTuples?
I think we should first tackle NamedArrays with dimension of 2 and see how such a NamedArray can be converted to a DataFrame (or any table-like data structure that IterableTables deals with as a sink) IterableTable can easily use a source which consist of any iterator who produces elements of type NamedTuple.
The problem of dimensions greater than 2 can't be handle in this project (I think... or at least not for now).
In R tables of arbitrary dimensionality can be converted to data frames. Each dimension gets transformed into a column, and an additional columns holds the values of the array entries.
@nalimilan so all the columns except the last one would hold the indices of that respective dimension? For example, this matrix:
1 2 3
4 5 6
7 8 9
Would be transformed into this table:
Dim1 | Dim2 | Value |
---|---|---|
1 | 1 | 1 |
1 | 2 | 2 |
1 | 3 | 3 |
2 | 1 | 4 |
2 | 2 | 5 |
2 | 3 | 6 |
3 | 1 | 7 |
3 | 2 | 8 |
3 | 3 | 9 |
? I think that might be a really good general solution. It would still be nice if there was some easy way to handle a matrix differently, i.e. keep the table structure of the matrix, but one that could be done by say a row
function.
I guess this also somehow interacts with this idea of how associative are handled. This would essentially treat an array as an associative, with the dimension indices as the key. I didn't follow that debate in detail, though...
Yes, that's it, though it's even clearer when there are dimension names rather than indices.
Ik looks like an R-style NamedArray
-> DataFrame
export would generally be beneficial.
If we can have NamedArray
export to IndexedTables
(and vice versa) it will be great.
Adding support of NamedArray to IterableTables will help to achieve this goal.
So you would want to treat 0
s in the NamedArray special for IndexedTables
, i.e., leave these entries out? That makes sense for FreqTable
output, but for a NamedArray
0
is type-specific and otherwise not very special.
I agree zeros should not be dropped.
How would this do for you?
using IndexedTables
import IndexedTables.IndexedTable
function IndexedTable(n::NamedArray)
L = length(n) # elements in array
cols = Dict{Symbol, Array}()
factor = 1
for d in 1:ndims(n)
nlevels = size(n, d)
nrep = L ÷ (nlevels * factor)
data = repmat(vcat([fill(x, factor) for x in names(n, d)]...), nrep)
cols[Symbol(dimnames(n, d))] = data
factor *= nlevels
end
return IndexedTable(Columns(;cols...), array(n)[:])
end
the 2 behaviours could be considered when converting NamedArray to IndexedTables. 1) Treat "0" (or any other value) as a special value that don't need to be report to IndexTable (because this kind of datastructure was specially designed to deal with sparse data) 2) treating all values in the same way should also be a possibility (maybe the default one)
The simple implementation above would not become very efficient, memory-wise, for very large and very sparse tables, if we filter out the 0
s afterwards. Anyway, the repmat(vcat([filll(...)])
is probably not the most efficient.
Maybe with the aim of filtering out some values, we should probably accept anonymous function instead of a given value such as "0"
x -> x == 0
There are two issues here, right? How to convert something to a IndexedTable, and how to convert something to just any table. Only the latter interacts with iterable tables at this point.
Not sure if there is really two issues here in fact... Your comment https://github.com/davidanthoff/IterableTables.jl/issues/65#issuecomment-345538256 shows that, if we extend your example to a 3 dim named array (or more) you can even transform it to a two dimensional array and so a table (and also to IndexedTables as it's a sink that is currenly supported by IterableTables)
On the other side... there is an issue about IndexedTables output with IterableTables not being able to filter out values to keep the sparse feature of IndexedTables
julia> using IterableTables
julia> using IndexedTables
julia> a=[0 0 1 0;2 0 3 0;0 0 5 0;2 0 0 1]
4×4 Array{Int64,2}:
0 0 1 0
2 0 3 0
0 0 5 0
2 0 0 1
julia> IndexedTable(a)
─────┬──
1 1 │ 0
1 2 │ 0
1 3 │ 1
1 4 │ 0
2 1 │ 2
2 2 │ 0
2 3 │ 3
2 4 │ 0
3 1 │ 0
3 2 │ 0
3 3 │ 5
3 4 │ 0
4 1 │ 2
4 2 │ 0
4 3 │ 0
4 4 │ 1
we could expect an api like
julia> IndexedTable(a, x -> x == 0)
─────┬──
1 3 │ 1
2 1 │ 2
2 3 │ 3
3 3 │ 5
4 1 │ 2
4 4 │ 1
If we want anonymous function to filter out
So in this case... this is clearly an other issue
or
julia> IndexedTable(a, x -> x != 0)
if we want anonymous function to define which values we want to keep
Issue opened at https://github.com/JuliaComputing/IndexedTables.jl/issues/91
Thanks to @davidavdav commit https://github.com/davidavdav/NamedArrays.jl/commit/5b8205f35198974c3597d46e69acc3538c549477 a NamedArray of any dimension can now be flattened (returning a flattened NamedArray) as exposed in https://github.com/davidanthoff/IterableTables.jl/issues/65#issuecomment-345538256
julia> using NamedArrays
julia> srand(1234);
julia> n=NamedArray(rand(2,4,3))
2×4×3 Named Array{Float64,3}
[:, :, C=1] =
A ? B │ 1 2 3 4
──────┼───────────────────────────────────────
1 │ 0.590845 0.566237 0.794026 0.200586
2 │ 0.766797 0.460085 0.854147 0.298614
[:, :, C=2] =
A ? B │ 1 2 3 4
──────┼───────────────────────────────────────────
1 │ 0.246837 0.648882 0.066423 0.646691
2 │ 0.579672 0.0109059 0.956753 0.112486
[:, :, C=3] =
A ? B │ 1 2 3 4
──────┼───────────────────────────────────────────
1 │ 0.276021 0.0566425 0.950498 0.945775
2 │ 0.651664 0.842714 0.96467 0.789904
julia> n[:]
24-element Named Array{Float64,1}
(:A, :B, :C) │
────────────────┼──────────
("1", "1", "1") │ 0.590845
("2", "1", "1") │ 0.766797
("1", "2", "1") │ 0.566237
("2", "2", "1") │ 0.460085
("1", "3", "1") │ 0.794026
("2", "3", "1") │ 0.854147
("1", "4", "1") │ 0.200586
("2", "4", "1") │ 0.298614
("1", "1", "2") │ 0.246837
("2", "1", "2") │ 0.579672
("1", "2", "2") │ 0.648882
("2", "2", "2") │ 0.0109059
("1", "3", "2") │ 0.066423
("2", "3", "2") │ 0.956753
("1", "4", "2") │ 0.646691
("2", "4", "2") │ 0.112486
("1", "1", "3") │ 0.276021
("2", "1", "3") │ 0.651664
("1", "2", "3") │ 0.0566425
("2", "2", "3") │ 0.842714
("1", "3", "3") │ 0.950498
("2", "3", "3") │ 0.96467
("1", "4", "3") │ 0.945775
("2", "4", "3") │ 0.789904
Hello,
I'm using FreqTables.jl freqtable. This function outputs NamedArray objects. Maybe it could be a good idea to add support for NamedArrays into IterableTables.
Kind regards