Open ansaardollie opened 3 weeks ago
Currently the transformer list command only knows about transformers it's been explicitly told about, see what's currently done in DataToolkitCommon
:
(NB: DataToolkitBase
has been renamed to DataToolkitCore
in the development version)
I don't currently see a nicer way of fetching the documentation, but I think I could probably check for undocumented transformers and mention them at the end of ?:
, how does that sound?
I'm also planning on improving the docs a bit to make this a bit easier/soften the learning curve :slightly_smiling_face:
Hi
Completely understand regarding the documentation for the repl. No worries, I've realized I've mis-explained the real issue.
Out of interest, have there been any major changes between v0.9.x to v0.10? I ask because my initial thought in trying to get a handle of how everything works was just to try and get dummy transformers working and see if the toolkit could recognize them. However I've since realized at least for the system to pick up the driver name's in Data.toml
; however I keep getting errors along the lines of:
UnsatisfyableTransformer: There are no storages for "cars" that can provide a .
The defined storages are as follows:
DataStorage{web}(IO)
I am trying to implement a Parquet driver, however I get issues as above. My basic approach thus far been to create a Julia package and then inside there define all the loader logic (which is the only transformer I've actually needed to use since I can get the parquet files through https).
I've tried following the approach of the example on this page
My package file src/dtk_data.jl
module dtk_data
using DataToolkit, DataToolkitBase, DataToolkitCommon, DataFrames
export load, supportedtypes, create
function __init__()
@addpkg Parquet2 "98572fba-bba0-415d-956f-fa77e587d26d"
@addpkg DataFrames "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
end
function load(loader::DataLoader{:parquet}, io::IO, ::Type{DataFrame})
@import Parquet2
@import DataFrames
return Parquet2.Dataset(io) |> DataFrames.DataFrame
end
supportedtypes(::Type{DataLoader{:parquet}}) =
[QualifiedType(:DataFrames, :DataFrame)]
create(::Type{DataLoader{:parquet}}, source::String) =
!isnothing(match(r"\.parquet$"i, source))
end # module dtk_data
Then I open julia session in the root directory of this package and run the following code
include("src/dtk_data.jl")
using .dtk_data
using DataToolkit
loadcollection!("Data.toml")
d"cars"
And my Data.toml
has the following setup
data_config_version = 0
uuid = "74641622-11fb-438b-b7be-4626639b8eac"
name = "dtk_data"
plugins = ["store", "defaults", "memorise"]
[[cars]]
uuid = "a6cee431-bfa1-4690-b8f3-51de93d970f5"
[[cars.storage]]
url = "https://github.com/ansaardollie/dtk_data/raw/main/MT%20cars.parquet"
type = "Base.IO"
driver = "web"
[[cars.loader]]
driver = "parquet"
type = "DataFrames.DataFrame"
Then I get the following error
ERROR: UnsatisfyableTransformer: There are no storages for "cars" that can provide a .
The defined storages are as follows:
DataStorage{web}(IO)
Stacktrace:
[1] _read(dataset::DataToolkitBase.DataSet, as::Type)
@ DataToolkitBase ~\.julia\packages\DataToolkitBase\LJn9B\src\interaction\externals.jl:253
[2] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::@Kwargs{})
@ Base .\essentials.jl:887
[3] invokelatest(::Any, ::Any, ::Vararg{Any})
@ Base .\essentials.jl:884
[4] invokepkglatest(::Any, ::Any, ::Vararg{Any}; kwargs::@Kwargs{})
@ DataToolkitBase ~\.julia\packages\DataToolkitBase\LJn9B\src\model\usepkg.jl:101
[5] invokepkglatest(::Any, ::Any, ::Vararg{Any})
@ DataToolkitBase ~\.julia\packages\DataToolkitBase\LJn9B\src\model\usepkg.jl:100
[6] (::DataToolkitBase.AdviceAmalgamation)(::Function, ::Any, ::Vararg{Any}; kwargs...)
@ DataToolkitBase ~\.julia\packages\DataToolkitBase\LJn9B\src\model\advice.jl:102
[7] (::DataToolkitBase.AdviceAmalgamation)(::Function, ::Any, ::Vararg{Any})
@ DataToolkitBase ~\.julia\packages\DataToolkitBase\LJn9B\src\model\advice.jl:98
[8] macro expansion
@ ~\.julia\packages\DataToolkitBase\LJn9B\src\model\advice.jl:131 [inlined]
[9] _dataadvisecall(::typeof(DataToolkitBase._read), ::DataToolkitBase.DataSet, ::Type{…}; kwargs::@Kwargs{})
@ DataToolkitBase ~\.julia\packages\DataToolkitBase\LJn9B\src\model\advice.jl:131
[10] read(dataset::DataToolkitBase.DataSet)
@ DataToolkitBase ~\.julia\packages\DataToolkitBase\LJn9B\src\interaction\externals.jl:160
[11] macro expansion
@ ~\.julia\packages\DataToolkit\VObGv\src\DataToolkit.jl:48 [inlined]
[12] top-level scope
@ REPL[5]:1
Some type information was truncated. Use `show(err)` to see complete types.
Any help would be appreciated. Would love to be able get a parquet driver working so I can hopefully contribute if you'd like.
Out of interest, have there been any major changes between v0.9.x to v0.10?
Yup! I'm making a few major changes (a changelog probably wouldn't hurt :sweat_smile:), such as:
@import
with @require
: d9e92265a95bb16452371070e5486e565c7cbdd5SmallDict
type (the original issue is a lot better with Memory
in 1.11): 98a6723e68de81689ffb972bb3c1cfab8da04ecbDataToolkitBase
to DataToolkitCore
DataToolkit
into a more user-facing DataToolkit
and package-facing (new) DataToolkitBase
FilePathsBase.AbstractPath
DataToolkitCommon
to DataToolkitCore
. It works a bit differently (IMO, better) and is now configured by Preferences
Regarding the problem you've run into, it looks like you've given enough info for it to be a MWE. I'll see if I can give it a look in the next day or two, otherwise I'll probably get to it on the weekend :slightly_smiling_face:.
however I keep getting errors along the lines of:
Good news, this error message is improved in 0.10-dev :slightly_smiling_face:
UnsatisfyableTransformer: There are no loaders for "cars" that can provide a DataFrames.DataFrame.
More good news, I think you'll find this works if you actually import the functions you want to overload
- export load, supportedtypes, create
+ import DataToolkitBase: load, supportedtypes, create
It would be great to see a Paraquet driver, I should have some docs on adding a loader to DataToolkitCommon
in the next week or so.
Awesome thank you so much for update.
Out of interest how would one add the v0.10-dev of the packages using the monorepo link to my Julia environment ?
Hi there,
I am trying to implement my own custom transformer named
customtransformer
. However when I try to run the?:
command in the REPL it doesn't pick these transformers up.I have a file called
custom_transformers.jl
which has the following contentI execute this file in the current Julia session and then run
Then when trying to list the transformers (using the
?:
command in the DataRepl) my custom transformer never shows up.What is the process to let
DataToolkit.jl
know about these custom transformers.