kynan / nbstripout

strip output from Jupyter and IPython notebooks
Other
1.19k stars 94 forks source link

required = true by default or make doc more explicit about it #191

Open memeplex opened 6 months ago

memeplex commented 6 months ago

By default the filter is not required:

One use of the content filtering is to massage the content into a shape that is more convenient for the platform, filesystem, and the user to use. For this mode of operation, the key phrase here is "more convenient" and not "turning something unusable into usable". [...] Another use of the content filtering is [...] turn it into a usable form upon checkout. These two filters behave differently, and by default, a filter is taken as the former, massaging the contents into more convenient shape. A missing filter driver definition in the config, or a filter driver that exits with a non-zero status, is not an error but makes the filter a no-op passthru. [...] You can declare that a filter turns a content that by itself is unusable into a usable content by setting the filter..required configuration variable to true.

Even a slight error in the configuration (e.g. if nbstripout changes location and is not found by git) may silently skip nbstripout. This is rather dangerous, since once big blobs are in the repo it's not easy to cleanup history, specially if it was already pushed to other remotes.

You mention required = true but just for manual installation at the bottom of the README.

Please make it the default mode of operation.

kynan commented 6 months ago

Your citation appears to be from the gitattributes documentation. The point you raise makes sense to me. Are there any issues / downsides from making the filter required by default? As this is a behaviour change I want to make sure other users won't be surprised or negatively affected.

memeplex commented 6 months ago

Your citation appears to be from the gitattributes documentation.

Right, sorry I forgot to add the link.

Are there any issues / downsides from making the filter required by default?

I think the difference of behavior will be just on the conservative side. It's hard to think about a "best effort" use case in which you don't care if a 20 MB notebook gets accidentally pushed to your history. We have had episodes like this at the office and they became nightmarish once the changes were spread.

kynan commented 6 months ago

Sorry to hear that this has already caused you pain šŸ˜ž Would you be interested in sending a PR for this?

memeplex commented 6 months ago

Sorry to hear that this has already caused you pain šŸ˜ž

Nah, no problem, it hasn't been like that, mostly people that forgot to install pre-commit hooks, not even filters. But the consequences are the same.

Would you be interested in sending a PR for this?

Is it just adding

check_call(git_config + ['filter.nbstripout.required', 'true'])

here?

kynan commented 5 months ago

Yes, that and the equivalent reverse operation in uninstall. Ideally also a test, though I'm not entirely sure what the best way to test this would be.