NBFormat Does Not Recognize Indentation

z3ht commented 3 years ago

Hello,

Thanks for the awesome tool! I would like to use it for a project I'm working on but have the following issue:

NBFormat does not recognize indentation levels people set for their notebook's JSON file (more info: https://jupyter-notebook.readthedocs.io/en/latest/frontend_config.html#example-changing-the-notebook-s-default-indentation). This is a problem because when a notebook with more than one space for indentation is formatted by NBFormat, all lines are rewritten to use single spaced indentation. As a result, the git diff is unreadable.

I propose the following two possible solutions (either would work):

NBFormat could parse how many spaces are used in a file and default to using that many spaces when writing JSON back to that file
NBFormat could accept an indent option within the nbformat.write function allowing users to specify how many spaces should be used

I suspect the second possible solution would be easier and therefore the better option.

Problem Walkthrough Original nb JSON (note the double-space indent):

  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "nbgrader": {
          "grade": false,
          "locked": true,
          "schema_version": 3,
          "solution": false
        },
        "id": "peZlh40I73Ql"
      },
      "source": [
        "# Deep Learning\n",
        "\n",
        "In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up."
      ]
    },
    ...
  }

After a call to nbformat.write(nb=nb, fp=notebook_path, version=nbformat.NO_CONVERT), the following is the JSON output (note the single-space indent):

 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "peZlh40I73Ql",
    "nbgrader": {
     "grade": false,
     "locked": true,
     "schema_version": 3,
     "solution": false
    }
   },
   "source": [
    "# Deep Learning\n",
    "\n",
    "In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up."
   ]
  },
  ...
}

Best, Andrew

westurner commented 3 years ago

IMO, the fixed indentation reduces line noise in diffs. Avoiding line noise in diffs justifies setting the indentation level to a fixed indent.

If the/a JSON parser was extended to support sniffing the (first few?) indentation levels, I still don't think nbformat should support configurable indentation levels.

If you need to reindent a JSON document to e.g. manually review the ipynb, python -m json.tool --indent 2 nb.ipynb may be helpful. https://docs.python.org/3/library/json.html#module-json.tool

On Sun, Aug 29, 2021, 22:44 Andrew Darling @.***> wrote:

Hello,

Thanks for the awesome tool! I would like to use it for a project I'm working on but have the following issue:

NBFormat does not recognize indentation levels people set for their notebook's JSON file (more info: https://jupyter-notebook.readthedocs.io/en/latest/frontend_config.html#example-changing-the-notebook-s-default-indentation). This is a problem because when a notebook with more than one space for indentation is formatted by NBFormat, all lines are rewritten to use single spaced indentation. As a result, the git diff is unreadable.

I propose the following two possible solutions (either would work):

NBFormat could parse how many spaces are used in a file and default to using that many spaces when writing JSON back to that file

NBFormat could accept an indent option within the nbformat.write function allowing users to specify how many spaces should be used

I suspect the second possible solution would be easier and therefore the better option.

Problem Walkthrough Original nb JSON (note the double-space indent):

"cells": [ { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "locked": true, "schema_version": 3, "solution": false }, "id": "peZlh40I73Ql" }, "source": [ "# Deep Learning\n", "\n", "In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up." ] }, ... }

After a call to nbformat.write(nb=nb, fp=notebook_path, version=nbformat.NO_CONVERT), the following is the JSON output (note the single-space indent):

"cells": [ { "cell_type": "markdown", "metadata": { "id": "peZlh40I73Ql", "nbgrader": { "grade": false, "locked": true, "schema_version": 3, "solution": false } }, "source": [ "# Deep Learning\n", "\n", "In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up." ] }, ... }

Best, Andrew

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jupyter/nbformat/issues/228, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMNS5ZQ3JYHPPOYPA2PQ3T7LWADANCNFSM5DA5Z4KA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

z3ht commented 3 years ago

IMO, the fixed indentation reduces line noise in diffs. Avoiding line noise in diffs justifies setting the indentation level to a fixed indent.

Hmm... Now that you say this, I'm realizing just how much line noise two different developers with two different indentation settings could create. I concede that the better solution for my problem is to enforce a uniform indentation level across our project.

If the/a JSON parser was extended to support sniffing the (first few?) indentation levels, I still don't think nbformat should support configurable indentation levels.

Although possibly not necessary, I think parsing just the first line would work fine. I know this invites the argument that there may be a hidden line with different indention than the rest of the file that would cause line noise when formatted but I'm really not concerned with it.

If you need to reindent a JSON document to e.g. manually review the ipynb, python -m json.tool --indent 2 nb.ipynb may be helpful. https://docs.python.org/3/library/json.html#module-json.tool

Thanks for pointing this out! If we decide to go with an indentation level other than 1 and NBFormat does not add custom-indention support, I'll be sure to look into using this.

z3ht commented 3 years ago

I'm closing this Issue because I no longer feel I need an indentation feature

westurner commented 3 years ago

Rog. For my nbformat with JSON-LD @context designs, perhaps others would feel differently about 2, maybe 3, 4 spaces instead of tabs?

The notebook-aware diffing support in nbdime may be helpful for your use case?

From https://github.com/markusschanta/awesome-jupyter/issues/8 :

nbdime - visual diff tool for Jupyter notebooks https://nbdime.readthedocs.io/en/stable/ (works as a merge driver and a diff driver) https://nbdime.readthedocs.io/en/stable/vcs.html#git-integration

https://github.com/Yogayu/awesome-jupyterlab-extension#version-control

jupyter / nbformat

NBFormat Does Not Recognize Indentation #228