tecosaur / DataToolkit.jl

Reproducible, flexible, and convenient data management
https://tecosaur.github.io/DataToolkit.jl
87 stars 4 forks source link

Calling `loadcollection!` from a package "turns on" the data REPL #42

Closed 70Gage70 closed 3 months ago

70Gage70 commented 3 months ago

Sorry in advance for the noob question. I have a simple function in a package that looks like this

function build_interpolants(;download_data::Bool = false)
    path2datatoml = joinpath(@__DIR__, "..", "..", "Data.toml") |> abspath
    loadcollection!(path2datatoml, @__MODULE__)

    if download_data
        data`store fetch`
    end

    # do something with data
end

The idea being that there's a good amount of raw data so I want the user to explicitly opt-in to download it, but that only needs to be done once because DataToolkit will manage it. Anyway, using this package and calling build_interpolants() loads the dataset from path2datatoml into the data REPL. Meaning, without ever actually using DataToolkit, I can press } and see (name of my dataset) data >. Naively, I found it a bit surprising that the data REPL was "turned on" like this.

Basically my question is just: what's the right way to do this? Namely, download the data but not actually expose any of the details. I'm also a bit suspicious that

data`store fetch`

might be the wrong idea since the docs say it downloads "all datasets" but I just want the one from path2datatoml.

tecosaur commented 3 months ago

Sorry in advance for the noob question.

Not at all, these packages are IMO a bit under-documented ATM (which is something I plan on putting a bit of work into), and questions are generally welcome :slightly_smiling_face:.

The idea being that there's a good amount of raw data so I want the user to explicitly opt-in to download it, but that only needs to be done once because DataToolkit will manage it.

Makes sense.

without ever actually using DataToolkit, I can press } and see (name of my dataset) data > . Naively, I found it a bit surprising that the data REPL was "turned on" like this.

Yep, this has occurred to me to and is part of the reason why I'm doing a package restructuring in the upcoming 0.10 release.

The end result is that large chunks of functionality are better organised, and the "just use it" recommendation becomes:

Of course, we're currently on 0.9, so this doesn't help currently. Know that there's active work going on to improve the package setup though :slightly_smiling_face:.

Basically my question is just: what's the right way to do this? Namely, download the data but not actually expose any of the details. I'm also a bit suspicious that data`store fetch` might be the wrong idea since the docs say it downloads "all datasets" but I just want the one from path2datatoml.

So, there's useful (currently private, should probably be made public) API for handling this in the Store submodule of DataToolkitCommon (soon to be DataToolkitStore in v0.10). Namely, the fetch! function, which can be used like so:

mycollection = loadcollection!(...)
DataToolkitCommon.Store.fetch!(mycollection) # v0.9
DataToolkitStore.fetch!(mycollection) # v0.10

Hope that helps :)

70Gage70 commented 3 months ago

Awesome, thanks! Greatly appreciate the detailed comment. I am following this package with great interest 🙂

tecosaur commented 3 months ago

A small update: fetch! will indeed be Public API in v0.10, https://tecosaur.github.io/DataToolkit.jl/store/#DataToolkitStore.fetch!