Open bollwyvl opened 9 years ago
Thanks for the suggestion, this sounds interesting. API-wise, I would think of an additional (optional) flag that would maybe write the produced output into the meta-tag.
Just wondering, what application and use-case would you have in mind? Right now, for example, I'd use this plugin to conveniently show the time-stamp of the last update to users. Or to show Python versions and packages that were used to create those results. I am just wondering how the "meta" tag could be additionally used to improve reproducibility.
Thanks for the response. Yeah, -m is already taken, but something to that effect.
I think the big win is that metadata in standard formats (iso, etc) is more unambiguously parseable by downstream consumers and UI than inline text. Instead of writing some regular expressions, one can json.load()[metadata][watermark] For example, on nbviewer, we show the kernel that was used to create the notebook.
So if one has a big stack of documentation notebooks in a repo, one can check for when they were actually executed, not when they were checked out, etc.
When we get better search, either in Jupyter hub or in custom deployments, metadata fields will just be ready to go as facets. An organization that has watermark as part of their "standard distribution" could gain a lot of insight, about a snapshot or over time.
On 23:34, Tue, Sep 1, 2015 Sebastian Raschka notifications@github.com wrote:
Thanks for the suggestion, this sounds interesting. API-wise, I would think of an additional (optional) flag that would maybe write the produced output into the meta-tag.
Just wondering, what application and use-case would you have in mind? Right now, for example, I'd use this plugin to conveniently show the time-stamp of the last update to users. Or to show Python versions and packages that were used to create those results. I am just wondering how the "meta" tag could be additionally used to improve reproducibility.
— Reply to this email directly or view it on GitHub https://github.com/rasbt/watermark/issues/4#issuecomment-136926970.
metadata in standard formats (iso, etc) is more unambiguously parseable by downstream consumers and UI than inline text.
Good point, I agree. In this context, I could also imagine an optional little add-on to write all current package specifications of the Python env into the metadata as in pip freeze > requirements.txt
Btw. something like
-s --save_meta
-g --generate_meta
seems to be okay! However, I would suggest to not use the 1-letter short form here and go with --generate_meta
to make it clear to a "user" of this notebook that the current watermark
would change the notebook's meta-data in some way upon re-execution.
Would you be interested in implementing such a feature?
Sorry I didn't get back to you sooner: traveling!
I'd love to take a whack at this. Hopefully I can get a PoC up quickly.
Addons are great, but likely outside the scope of this particular request!
But, since we're off topic... I highly recommend building thementry_point
s vs namespace tomfoolery or magic module/function names.
In addition to pip
, i'd consider being able to serialize the state of:
conda
apt
dnf
/ yum
brew
hg
No need to apologize, and I am sorry, too. It was a pretty hectic week. I am currently in final stage of finishing up my new book that is coming out in 1-2 weeks and there is a lot of stuff to be done :).
So, I think writing to the meta-tags as an option would be great. And I will open separate issues for the other suggestions. I like the idea of considering other "managers"/"environments"
Cheers, Sebastian
Worth reheating this discussion? I think it would be cool to have the information inside the metadata of the notebook. Then follow up with a PR for https://github.com/conda-tools/conda-execute/issues/3 which might make the notebook a "shareable unit". Right now for sharing notebooks you need to make repository with a requirements.txt
or some such.
I don't really know much about the formatting recommendation/guidelines in/for Jupyter notebooks, and if there's a difference between Jupyter Notebook and Jupyter Lab in terms on what gets written to .ipynb files. However, I noticed that in the Jupyter Lab UI, there's a metadata field, which would probably be equivalent to what @bollwyvl mentioned with
{
"metadata": {
"watermark": {
"date": "2015-17-06T15:04:35",
"CPython": "3.4.3",
"IPython": "3.1.0",
"compiler": "GCC 4.2.1 (Apple Inc. build 5577)",
"system" : "Darwin",
"release" : "14.3.0",
"machine": "x86_64",
"processor" : "i386",
"CPU cores": "4",
"interpreter": "64bit"
}
}
In any case, if you or @bollwyvl or someone else would like to implement this (a way to optionally write metadata), I'd be very open to this and be happy to merge it (there was good work in progress over at #7 ).
This could be either via a
--metadata
flag.
Watermark looks great for reproducibility.
It would be nice to have an option to (also) store this data in the notebook
metadata
:Maybe some more hierarchy in there as well...
Since the kernel doesn't have any idea what's going on w/r/t notebooks, it would probably have to be done with a
display.Javascript
:Happy to help with a PR, if you would think there is a place for this!