Closed bagohart closed 8 months ago
Hi @bagohart !
Thanks! It should be an issue in Tables.jl. Tables.jl says this type complies with the interface, but it does not implement the required methods:
julia> using Tables
julia> M = [(a=1, b=2) (a=3, b=4); (a=5, b=6) (a=7, b=8)]
2×2 Matrix{NamedTuple{(:a, :b), Tuple{Int64, Int64}}}:
(a = 1, b = 2) (a = 3, b = 4)
(a = 5, b = 6) (a = 7, b = 8)
julia> Tables.istable(M)
true
julia> Tables.rowaccess(M)
false
julia> Tables.columnaccess(M)
false
We can fix it temporarily by doing:
julia> import Tables
julia> Tables.columnaccess(x::Matrix{T}) where T<:NamedTuple = true
julia> pretty_table([(a=1,b=2) (a=3,b=4); (a=5,b=6) (a=7,b=8)])
┌───┬───┐
│ a │ b │
├───┼───┤
│ 1 │ 2 │
│ 5 │ 6 │
│ 3 │ 4 │
│ 7 │ 8 │
└───┴───┘
@bkamins, @quinnj can you help here?
Two things I see:
Tables.rowaccess(M)
should return true
(this can be a PR to Tables.jl)Tables.rowaccess
or Tables.columnaccess
return true
. They can both return false
for a valid table. (in which case I think internally in PrettyTables.jl you just need to pick if you go for column or row access.) Something like
if Tables.isTable(M)
if Tables.rowaccess(M)
# do something when you have fast row access
else
# otherwise just assume column access
end
end
Hi @bkamins !
Maybe there is just a misunderstanding about Tables.jl from my part. Reading here https://tables.juliadata.org/stable/#Tables.jl-Documentation I thought that something compliant with Tables.jl API must implement:
Tables.istable
Tables.rowacces
and Tables.rows
) or (Tables.columnaccess
and Tables.column
)Am I wrong?
You are wrong.
Tables.rowacces
guarantees fast row access.
Tables.columnacces
guarantees fast column access.
There can be tables that neither have fast row nor fast column assess (e.g. they have a completely different memory layout and both row and column access requires copying of the data).
At the same there are tables that return true
in both cases (because they have layout that allows both approaches without copying):
julia> m = Tables.table([1 2; 3 4])
Tables.MatrixTable{Matrix{Int64}} with 2 rows, 2 columns, and schema:
:Column1 Int64
:Column2 Int64
julia> Tables.rowaccess(m)
true
julia> Tables.columnaccess(m)
true
Also Tables.istable
is only a weak contract:
true
then it promises that what you work with is a tablefalse
still what you work with might be a table, but does not have to be a tableHere is an example of a table that returns false
for Tables.istable
(but correctly gets materialized as a DataFrame
):
julia> struct T
a::Int
b::Int
end
julia> x = [T(1,2), T(3,4), T(5,6)]
3-element Vector{T}:
T(1, 2)
T(3, 4)
T(5, 6)
julia> DataFrame(x)
3×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 2
2 │ 3 4
3 │ 5 6
julia> Tables.istable(x)
false
Hi @bkamins,
I confess I am really confused right now. Reading here:
It is crystal clear to me that if an object m
follows Tables.jl API, which is indicated by returning Tables.istable(m) == true
, it must implement Tables.columns(m)
or Tables.rows(m)
. In the former, it also must return Tables.rowaccess(m) == true
and in the later Tables.columnaccess(m) == true
. By reading that documentation section, this assumption seems to be true even if I need to copy data.
If this assumption is not correct, I think Tables.jl documentation must be changed and I need help. For example, if m
complies with Tables.jl, indicated by Tables.istable(m) == true
, but both Tables.rowaccess
and Tables.columnaccess
is false
, how am I supposed to access the data?
Notice that I am not talking about tables in general (as x
in your example), but about objects that follows Tables.jl API.
These are "brief descriptions". And indeed this is confusing, as the docstrings say something else:
help?> Tables.rowaccess
Tables.rowaccess(x) => Bool
Check whether an object has specifically defined that it implements the Tables.rows function that does not copy table data. That is to say, Tables.rows(x) must be done with O(1) time and space complexity
when Tables.rowaccess(x) == true. Note that Tables.rows will work on any object that iterates AbstractRow-compatible objects, even if they don't define rowaccess, e.g. a Generator of NamedTuples. However,
this generic fallback may copy the data from input table x. Also note that just because an object defines rowaccess doesn't mean a user should call Tables.rows on it; Tables.columns will also work,
providing a valid AbstractColumns object from the rows. Hence, users should call Tables.rows or Tables.columns depending on what is most natural for them to consume instead of worrying about what and how
the input is oriented.
It is recommended that for users implementing MyType, they define only rowaccess(::Type{MyType}). rowaccess(::MyType) will then automatically delegate to this method.
help?> Tables.columnaccess
Tables.columnaccess(x) => Bool
Check whether an object has specifically defined that it implements the Tables.columns function that does not copy table data. That is to say, Tables.columns(x) must be done with O(1) time and space
complexity when Tables.columnaccess(x) == true. Note that Tables.columns has generic fallbacks allowing it to produces AbstractColumns objects, even if the input doesn't define columnaccess. However, this
generic fallback may copy the data from input table x. Also note that just because an object defines columnaccess doesn't mean a user should call Tables.columns on it; Tables.rows will also work, providing
a valid AbstractRow iterator. Hence, users should call Tables.rows or Tables.columns depending on what is most natural for them to consume instead of worrying about what and how the input is oriented.
It is recommended that for users implementing MyType, they define only columnaccess(::Type{MyType}). columnaccess(::MyType) will then automatically delegate to this method.
Which for me clearly state that the condition is "not copying data" in both cases (so if data would be copied false
should be returned).
Let us wait for what @quinnj says.
Oh! I haven't seen those docstrings. From this point of view, you are right. So, I will wait @quinnj to verify how I should modify the current PrettyTables.jl interface with Tables.jl.
@ronisbr, I agree that anyone implementing the interface should define rowaccess
or columnaccess
along with their definition of rows
/columns
. Note that when using the interface, rowaccess
is not recommended to be checked/used externally, but is only used internally by other tables interface functions. The only thing users should be using from the interface is Tables.rows
or Tables.columns
.
There are potential advanced use-cases where using rowaccess
/columnaccess
externally may make sense (for example, the TableOperations.jl package does some tricks around using this, but mainly for passing it through to downstream functions: https://github.com/JuliaData/TableOperations.jl/blob/fab3e72e53d1faf976120de360a5e492fe1e746b/src/TableOperations.jl#L92). But I would say those use-cases should be rare and carefully constructed.
Hopefully that helps clarify?
Hi @quinnj !
Thank you very much for the detailed information. I think that is precisely the case with PrettyTables.jl. If we check for row or column access, we can print those tables way faster than just using one type of iterator by default, which can lead to copying a lot of data without necessity.
So, I will use the following algorithm to get the [i, j]
element:
1. If we have columns access:
1. Get the column name related to the j-th column using `Tables.columnnames`.
2. Get the data using `Tables.getcolumn(table, name)[i]`.
2. If we have row access:
1. Get the column name related to the j-th column using `Tables.columnnames`.
2. Try using `subset` to obtain the i-th row. If it fails, fall back to iterating `Tables.rows` for i times.
3. Get the element using `row[column_name]`.
Does it make sense?
Yes, that sounds roughly correct. One detail I'd mention is that I think it's fine to check Tables.columnaccess
but if false, I wouldn't then check Tables.rowaccess
, but just assume it's row-oriented and call Tables.subset
, then fallback to Tables.rows
if that fails. This would allow, for example, a generator of NamedTuples to still succeed (even though it returns false
for Tables.rowaccess
).
Perfect! Thanks @quinnj !
Using the approach proposed by @quinnj , now we can print this type:
julia> pretty_table([(a=1,b=2) (a=3,b=4); (a=5,b=6) (a=7,b=8)])
┌───┬───┐
│ a │ b │
├───┼───┤
│ 1 │ 2 │
│ 5 │ 6 │
│ 3 │ 4 │
│ 7 │ 8 │
└───┴───┘
Side note: you are not seeing what you would expect because this type follows Tables.jl API. If you want to see the matrix of named tuples, you can do:
julia> pretty_table(Any[(a=1,b=2) (a=3,b=4); (a=5,b=6) (a=7,b=8)])
┌────────────────┬────────────────┐
│ Col. 1 │ Col. 2 │
├────────────────┼────────────────┤
│ (a = 1, b = 2) │ (a = 3, b = 4) │
│ (a = 5, b = 6) │ (a = 7, b = 8) │
└────────────────┴────────────────┘
What would a lazy version produce:
(x for x in [(a=1,b=2) (a=3,b=4); (a=5,b=6) (a=7,b=8)])
Hi @bkamins !
Sorry for the delay, I missed the notification.
This lazy version will not work since PrettyTables.jl does not know how to handler generators:
julia> a = (x for x in [(a=1,b=2) (a=3,b=4); (a=5,b=6) (a=7,b=8)])
Base.Generator{Matrix{NamedTuple{(:a, :b), Tuple{Int64, Int64}}}, typeof(identity)}(identity, NamedTuple{(:a, :b), Tuple{Int64, Int64}}[(a = 1, b = 2) (a = 3, b = 4); (a = 5, b = 6) (a = 7, b = 8)])
julia> pretty_table(a)
ERROR: The type Base.Generator{Matrix{NamedTuple{(:a, :b), Tuple{Int64, Int64}}}, typeof(identity)} is not supported.
Hey, cool project! I mostly got the table working that I need, but since I found this bug, I thought I could as well report it.
This works:
This does not:
Instead, I get:
Expected behaviour: the argument to
pretty_table
should be recognized as a Matrix and printed.