tecosaur / DataToolkit.jl

Reproducible, flexible, and convenient data management
https://tecosaur.github.io/DataToolkit.jl
78 stars 4 forks source link

REPL information about dataset #27

Closed nicolamos closed 3 weeks ago

nicolamos commented 2 months ago

When defining multiple datasets with the same name, seems very hard to retrieve them by specifying eg. the date. Using UUIDs for that seems inconvenient as I think it is easier to specify parameters that are actually present in the data.

tecosaur commented 2 months ago

UUIDs are great for completely unambiguous references, but I completely get that from a usability point, they're not great. That's why dataset supports the syntax dataset("name", "key" => value...).

For example, with this setup

data_config_version = 0
uuid = "663bbc0d-575d-4caf-ab51-b2ec6737a15b"
name = "demo"

[[atest]]
uuid = "92b63dcf-949f-4439-af84-f20996166481"
date = 2024-04-01

    [[atest.storage]]
    driver = "raw"
    value = 7

    [[atest.loader]]
    driver = "passthrough"

[[atest]]
uuid = "17db2f80-8f48-4344-8f09-894c6676331f"
date = 2024-04-07

    [[atest.storage]]
    driver = "raw"
    value = 26

    [[atest.loader]]
    driver = "passthrough"

You can use dataset like so:

julia> dataset("atest", "date" => Date("2024-04-01")) |> read
7

julia> dataset("atest", "date" => Date("2024-04-07")) |> read
26

Let me know how that seems to you :slightly_smiling_face:

tecosaur commented 3 weeks ago

In the upcoming v0.10, I'll also be showing more information about the dataset in the REPL:

image

Between this and my earlier comment, it should be pretty easy to identify and get a datasets based on its parameters.

nicolamos commented 3 weeks ago

Thank you for the updates. Having the parameters available shown in the REPL is very useful.