Closed yasirroni closed 1 year ago
Fails due to pylint:
poetry run pylint src
************* Module nb_clean
src\nb_clean\__init__.py:147:0: R0913: Too many arguments (6/5) (too-many-arguments)
-----------------------------------
Your code has been rated at 9.95/10
Any update, @srstevenson
The fails is not mandatory and can be fixed by making all args as keyword-only-argument
I've held off from merging this because I'm not yet convinced there's a real user requirement for it. Most of the notebook metadata doesn't change frequently, and so doesn't contribute to the noise in VCS diffs which nb-clean was written to address.
Different editor such as Google Colab vs VSCode and different Python version across user produce different metadata. That is the use case.
Not convinced enough?
@srstevenson:
Most of the notebook metadata doesn't change frequently, and so doesn't contribute to the noise in VCS diffs which nb-clean was written to address.
I would like to support the idea of this PR (can't yet assess its quality wrt. nb-clean coding standards). As I mentioned in https://github.com/srstevenson/nb-clean/issues/157#issuecomment-1401977349, the notebook-level metadata actually do make noise in git commits.
For example, this metadata gets changed right after opening in VS Code without even running the notebook, and populated with things like conda environment name or some weird hash.
For me, the ability to filter out those is entirely in the spirit of nb-clean.
Note: nb-clean was run after each step
"metadata": {
"language_info": {
"name": "python"
},
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"metadata": {
"kernelspec": {
"display_name": "scratch",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
},
"vscode": {
"interpreter": {
"hash": "8951b5bc4efd704e8ce7b1700f3d36aa2c5348526bfde6695033584eb49dd3bb"
}
}
},
This is on the same machine using the same conda environment (!) Needless to say that if my colleague happens to open the notebook on his computer, edit some stuff, save and do a git commit with the nb-clean hook, the commit will include his edits PLUS several metadata changes.
@haplav
"metadata": { "language_info": { "name": "python" },
Since notebook metadata language_info is indeed needed, current PR remove this metadata. How about add other parameter such as --add-language-metadata NAME, with default name to Python?
Another approach is to change this PR into --preserve-metadata language_info metadata2 metadata3
. But, support of hierarchy of metadata sometimes are needed.
But, based on your example, the language_info
is destroyed and replaced with codemirror_mode
. That is another problem. And that is not what happen on my side. Here is my case:
Original:
"metadata": {
"language_info": {
"name": "python"
}
After VSCode open:
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.10 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.8.10"
},
"vscode": {
"interpreter": {
"hash": "..."
}
}
"metadata": { "language_info": { "name": "python" },
Since notebook metadata language_info is indeed needed, current PR remove this metadata. How about add other parameter such as --add-language-metadata NAME, with default name to Python?
I don't understand @yasirroni, I don't mind having this in the cleaned notebook with hard-wired "name": "python"
. It seems rather unlikely this would need to be changed.
Another approach is to change this PR into
--preserve-metadata language_info metadata2 metadata3
. But, support of hierarchy of metadata sometimes are needed.
I don't think we need to complicate things right now.
But, based on your example, the
language_info
is destroyed and replaced withcodemirror_mode
. That is another problem. And that is not what happen on my side. Here is my case: Original:
Ah OK, I think you read it wrong; language_info
is NOT destroyed; codemirror_mode
gets just ADDED to it.
Please let GitHub apply suggestions and resolve discussions automatically as described in https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/incorporating-feedback-in-your-pull-request
Please let GitHub apply suggestions and resolve discussions automatically as described in https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/incorporating-feedback-in-your-pull-request
Did not know that feature exist. Sorry, Sir.
Ah OK, I think you read it wrong;
language_info
is NOT destroyed;codemirror_mode
gets just ADDED to it.
Yeah, but the structure changes, since name
no longer the child of language_info
.
"metadata": { "language_info": { "name": "python" },
Since notebook metadata language_info is indeed needed, current PR remove this metadata. How about add other parameter such as --add-language-metadata NAME, with default name to Python?
I don't understand @yasirroni, I don't mind having this in the cleaned notebook with hard-wired
"name": "python"
. It seems rather unlikely this would need to be changed.
But, I also works in Octave and octave-based notebook did not use that metadata. Here is a new notebook in Octave created using jupyterlab:
"metadata": {
"kernelspec": {
"display_name": "Octave",
"language": "octave",
"name": "octave"
}
},
I think, two options might be good.
Hard clean: like this PR. Soft clean: like this PR, but skip for
"metadata": {
"kernelspec": {
"language": "..."
},
"language_info": {
"name": "...",
},
},
Yeah, but the structure changes, since
name
no longer the child oflanguage_info
.
No. Please, take a good look - name
is contained in kernelspec
and codemirror_mode
but also still directly in language_info
:
"metadata": {
"kernelspec": {
...
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "python",
...
},
...
"name": "python",
...
},
...
},
I think, two options might be good.
Once again, I'm fine with keeping just metadata/language_info/name
(if already present). It seems to be the same in all cases you and I mentioned. But it doesn't make any sense to preserve kernelspec
exactly because
But, I also works in Octave and octave-based notebook did not use that metadata. Here is a new notebook in Octave created using jupyterlab:
"metadata": { "kernelspec": { "display_name": "Octave", "language": "octave", "name": "octave" } },
It's changed based on what interpreter the kernel uses. Keeping just empty metadata
, as this PR already does, looks also fine to me. I don't think any interpreter would depend on having anything in there.
No. Please, take a good look - name is contained in kernelspec and codemirror_mode but also still directly in language_info:
@haplav oops. i'm sorry. It will be easier than. _ Okay, maybe just let this PR as it is and later add feature to skip language_info
@srstevenson any chance to get this merged?
This PR was closed due to inactivity. Please reopen if still relevant.
Any chance this can be re-opened? I'm willing to fix merge conflict. @srstevenson
Add new argument for clean and filter "-M" or "--remove-notebook-metadata" to "remove notebook metadata.
re-PR from https://github.com/srstevenson/nb-clean/pull/163