queryverse / Query.jl

Query almost anything in julia
Other
394 stars 49 forks source link

Add ability to use @select on SQL data sources #309

Open tlnagy opened 4 years ago

tlnagy commented 4 years ago

I have a very large table in a SQL database that I'm working with and it would be great to ignore some columns before loading, but currently I can't drop columns using @select:

julia> r = DBInterface.execute(db, "SELECT * FROM fxm");

julia> r |> @select(-:c)
MethodError: no method matching remove(::SQLite.Row, ::Val{:c})
Closest candidates are:
  remove(!Matched::NamedTuple{an,T} where T<:Tuple, ::Val{bn}) where {an, bn} at /home/tlnagy/.julia/packages/QueryOperators/g4G21/src/NamedTupleUtilities.jl:27

Stacktrace:
 [1] (::var"#52#54")(::SQLite.Row) at /home/tlnagy/.julia/packages/Query/AwBtd/src/query_translation.jl:58
 [2] iterate at /home/tlnagy/.julia/packages/QueryOperators/g4G21/src/enumerable/enumerable_map.jl:25 [inlined]
 [3] iterate at ./iterators.jl:641 [inlined]
 [4] iterate at ./iterators.jl:639 [inlined]
 [5] _collect at ./array.jl:614 [inlined]
 [6] collect at ./array.jl:603 [inlined]
 [7] |> at ./operators.jl:823 [inlined]
 [8] printtable(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}, ::String; force_unknown_rows::Bool) at /home/tlnagy/.julia/packages/TableShowUtils/ImkA9/src/TableShowUtils.jl:16
 [9] printtable(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}, ::String) at /home/tlnagy/.julia/packages/TableShowUtils/ImkA9/src/TableShowUtils.jl:7
 [10] show(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}) at /home/tlnagy/.julia/packages/QueryOperators/g4G21/src/enumerable/show.jl:35
 [11] show(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::MIME{Symbol("text/plain")}, ::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}) at ./multimedia.jl:47
 [12] limitstringmime(::MIME{Symbol("text/plain")}, ::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}) at /home/tlnagy/.julia/packages/IJulia/DrVMH/src/inline.jl:43
 [13] display_mimestring(::MIME{Symbol("text/plain")}, ::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}) at /home/tlnagy/.julia/packages/IJulia/DrVMH/src/display.jl:67
 [14] display_dict(::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}) at /home/tlnagy/.julia/packages/IJulia/DrVMH/src/display.jl:96
 [15] #invokelatest#1 at ./essentials.jl:712 [inlined]
 [16] invokelatest at ./essentials.jl:711 [inlined]
 [17] execute_request(::ZMQ.Socket, ::IJulia.Msg) at /home/tlnagy/.julia/packages/IJulia/DrVMH/src/execute_request.jl:112
 [18] #invokelatest#1 at ./essentials.jl:712 [inlined]
 [19] invokelatest at ./essentials.jl:711 [inlined]
 [20] eventloop(::ZMQ.Socket) at /home/tlnagy/.julia/packages/IJulia/DrVMH/src/eventloop.jl:8
 [21] (::IJulia.var"#15#18")() at ./task.jl:358

It works fine if I first convert to a DataFrame

julia> r |> DataFrame |> @select(-:c)
3x2 query result
a │ b
──┼──
1 │ 3
2 │ 4
3 │ 5
davidanthoff commented 4 years ago

The tabular query commands like @select expect a stream of named tuples, but here you are getting a stream of SQLite.Row elements instead. If you convert to a DataFrame first then it will iterate named tuples, so that is why it works in that case.

Two ideas: you could either try to convert these Row instances into named tuples somehow. I don't know if they have that ability built in, but maybe something like @map(NamedTuple(_)) or @map({_...}) might work? I haven't tried that.

You could also give https://github.com/queryverse/QuerySQLite.jl a shot. That is an alternative Query.jl backend that translates your Query.jl query into SQL. You would write something like this:

import QuerySQLite

db = QuerySQLite.Database(db)

db.fxm |> @select(-:c)

And in theory that should then run a SQL statement that never even loads the c column from the database at all.

I'm not sure, though, whether this works with the -:c construction, @bramtayl would know.

bramtayl commented 4 years ago

I haven't done any performance testing on QuerySQL at all, but it's definitely worth a shot! If you don't see performance improvements (highly possible), open an issue and I can try to see what I can do (it might just be a matter of optimizing the generated SQL code a bit).