tecosaur / DataToolkit.jl

Reproducible, flexible, and convenient data management
https://tecosaur.github.io/DataToolkit.jl
87 stars 4 forks source link

Question: could DataToolkit.jl handle data versioning? #4

Closed camilogarciabotero closed 1 year ago

camilogarciabotero commented 1 year ago

Hi,

Thanks for working on this package. I was wondering if there is any already implemented feature in the Data.toml to handle versioning of the data. I was thinking something similar as in dolt or as in the DataSets.jl.

Best

tecosaur commented 1 year ago

Versioning is supported via the versions plugin and duplication. For example,

Data.toml

data_config_version = 0
plugins = ["versions"]

[[thing]]
version = 1

    [[thing.storage]]
    driver = "raw"
    value = "a"

    [[thing.loader]]
    driver = "passthrough"

[[thing]]
version = 2

    [[thing.storage]]
    driver = "raw"
    value = "b"

    [[thing.loader]]
    driver = "passthrough"

Julia REPL

julia> using DataToolkit
[ Info: Data set 'thing' had no UUID, one has been generated.
[ Info: Data set 'thing' had no UUID, one has been generated.

(demo) data> plugin add versions
 + Added plugins: versions

julia> d"thing"
ERROR: AmbiguousIdentifier: "thing" matches multiple data sets
    □:thing@1 [7eba51e8-1a88-4e64-8791-fa7b20a39fc5]
    □:thing@2 [ef154d2c-26ee-43bc-a5d2-faa32e1da9cf]
  [...]

julia> d"thing@1"
"a"

julia> d"thing@2"
"b"
camilogarciabotero commented 1 year ago

Thanks for the example!