oxinabox / DataDeps.jl

reproducible data setup for reproducible science
Other
151 stars 43 forks source link

Pluto.jl support #176

Open fonsp opened 6 days ago

fonsp commented 6 days ago

Hi! Thanks for this cool package!

Using MLDatasets.jl in Pluto does not always work πŸ˜΅β€πŸ’« which is a shame because it's really useful for our Julia-beginner target audience! Running iris = Iris() will leave the cell stuck running forever. This is because DataDeps.jl tries to ask the user for permission:

Do you want to download the dataset from ["https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"] to "/Users/fons/.julia/scratchspaces/124859b0-ceae-595e-8997-d05f6a7a8dfe/datadeps/Iris"?
[y/n]

But Pluto does not have a stdin terminal interface, users cannot enter y/n. In Pluto, Base.isinteractive() returns false to tell packages about this (previous discussion with good foresight in https://github.com/oxinabox/DataDeps.jl/issues/12#issuecomment-354108496 🌟), but it looks like DataDeps instead uses the "CI" env variable to make this distinction.

Ideally, running MLDatasets.Iris() in a non-interactive session should throw an error saying that you should set DATADEPS_ALWAYS_ACCEPT, perhaps with an example code snippet.

Ideas

Perhaps Base.isinteractive() can be used instead of env_bool("CI")?

Or a bit more low-level: function better_readline can throw when stream === stdin and isinteractive is false? Since this is guaranteed to block forever.

Let me know what you think, thank you!

PS I knowwww that Pluto should "just" support terminal input, but it's really really complicated! And by not supporting it, we have a nice side effect that people will author notebooks that never require user input to re-run on another computer.

oxinabox commented 5 days ago

We can't use Base.isinteractive (as discussed in #12) because that returns false for if you just are running a normal script via commandline julia myscript.jl, where you do have stdin.

In general better_readline has gone through many interations for how to work with detecting there is no input possible due to being on CI. https://github.com/oxinabox/DataDeps.jl/blob/8fdc7ce424377f246f1b5900277def7256bfaada/src/util.jl#L54-L61 Because different julia versions have had different behavours for this. like the stream being closed, the stream being closed and openning it being a no op, the stream being open but reading it always being an immediate empty string. Which is why we ended up giving up and just checking if ENV["CI"] because it kept changing. Possibly we should just check if stdin is closed, and if so we should throw and error (which could then result in a useful instruction like you say). Can Pluto make sure that stdin is closed?

Pluto should "just" support terminal input, but it's really really complicated!

This has long been my position. I feel like we have talked about this before. Jupyter does this via popping open a little text input. Why can't Pluto?