Closed ankitrokdeonsns closed 3 years ago
Thank you for the pull request. There's a few points to discuss:
Your PR only addresses the code path of calling nbstripout
with a file argument. There is also the (more common) way of reading from stdin, which is used in the git
filter.
To handle the case of reading from stdin we need to either
My preference is being explicit using a flag e.g. --format
which defaults to jupyter
We'd need to decide how to handle the various flags nbstripout
supports. Do they have equivalents for Zeppelin? Does Zeppelin have execution counts? Or notebook and cell level metadata? Tags?
If it doesn't make sense to support these flags, what do we want to do when running in "Zeppelin mode" and an incompatible flag is passed? Ignore silently? Ignore with a warning? Fail?
Please also add tests and update the documentation as part of this pr.
Thank you for the pull request. There's a few points to discuss:
- Your PR only addresses the code path of calling
nbstripout
with a file argument. There is also the (more common) way of reading from stdin, which is used in thegit
filter.
Correct. Zeppelin notebooks (v0.9+) have *.zpln
extension which can be used for setting up the git filter.
Will be doing so in upcoming commits.
To handle the case of reading from stdin we need to either
- detect the file type or
- specify the file type using a flag
My preference is being explicit using a flag e.g.
--format
which defaults tojupyter
.zpln
extension should take care of this. For older Zeppelin notebooks it would be a little more tricky since they are stored as note.json
files
- We'd need to decide how to handle the various flags
nbstripout
supports. Do they have equivalents for Zeppelin? Does Zeppelin have execution counts? Or notebook and cell level metadata? Tags? If it doesn't make sense to support these flags, what do we want to do when running in "Zeppelin mode" and an incompatible flag is passed? Ignore silently? Ignore with a warning? Fail?
Yes some similar metadata is present like last execution time. I will update more on this.
Please also add tests and update the documentation as part of this pr.
Will do.
As far as I know Zeppelin does not have support for adding metadata or tags to cells. So keep-output
is currently not possible for Zeppelin notebooks.
Zeppelin notebooks also do not have execution count. So keep-count
also cant be supported.
We can ignore such things silently for zeppelin notebooks. At least for now.
Structure of Zeppelin Notebook cell:
{
"text": "%md\n# Hello",
"user": "anonymous",
"dateUpdated": "2020-07-08 16:46:54.732",
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
"enabled": true,
"results": {},
"editorSetting": {
"language": "text",
"editOnDblClick": false,
"completionKey": "TAB",
"completionSupport": true
},
"editorMode": "ace/mode/text"
},
"settings": {
"params": {},
"forms": {}
},
"results": {},
"apps": [],
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1594203861277_1152517137",
"id": "paragraph_1594203861277_1152517137",
"dateCreated": "2020-07-08 15:54:21.277",
"dateStarted": "2020-07-08 16:46:54.740",
"dateFinished": "2020-07-08 16:46:54.854",
"status": "FINISHED"
}
We could consider to clear keys such as dateCreated
, dateStarted
, dateFinished
, status
while we clear the output. What do you think?
That sounds reasonable! I'm still in favor of having a flag to --format
which defaults to jupyter
and also supports zeppelin
, since that will work both with file input and reading from stdin.
Ok. Currently In My solution --format
is not necessarily required but we could use that for supporting older versions of zeppelin notebooks. I will add that flag as you suggested. 👍
Sure. I was busy last week. I will add the test tomorrow and update the pull request.
I have incorporated changes as you suggested. Added tests and updated documentation. I believe now it is ready for merge. Do let me know if I missed anything.
Hi @kynan I believe this PR is ready for merge.
incorporated latest requested changes
Good catch @kynan. I think this remained due to bad merging during rebase. Thanks. Deleted the extra block.
Zeppelin Notebooks are popular especially while working with Apache Spark Framework. Since the basic structure of the IPython / Jupyter and Zeppelin Notebooks is similar it would be cool to provide nbstripout support for Zeppelin Notebooks as well.