jupyter / nbformat

Reference implementation of the Jupyter Notebook format
http://nbformat.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
265 stars 152 forks source link

NBFormat Does Not Recognize Indentation #228

Closed z3ht closed 3 years ago

z3ht commented 3 years ago

Hello,

Thanks for the awesome tool! I would like to use it for a project I'm working on but have the following issue:

NBFormat does not recognize indentation levels people set for their notebook's JSON file (more info: https://jupyter-notebook.readthedocs.io/en/latest/frontend_config.html#example-changing-the-notebook-s-default-indentation). This is a problem because when a notebook with more than one space for indentation is formatted by NBFormat, all lines are rewritten to use single spaced indentation. As a result, the git diff is unreadable.

I propose the following two possible solutions (either would work):

I suspect the second possible solution would be easier and therefore the better option.

Problem Walkthrough Original nb JSON (note the double-space indent):

  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "nbgrader": {
          "grade": false,
          "locked": true,
          "schema_version": 3,
          "solution": false
        },
        "id": "peZlh40I73Ql"
      },
      "source": [
        "# Deep Learning\n",
        "\n",
        "In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up."
      ]
    },
    ...
  }


After a call to nbformat.write(nb=nb, fp=notebook_path, version=nbformat.NO_CONVERT), the following is the JSON output (note the single-space indent):

 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "peZlh40I73Ql",
    "nbgrader": {
     "grade": false,
     "locked": true,
     "schema_version": 3,
     "solution": false
    }
   },
   "source": [
    "# Deep Learning\n",
    "\n",
    "In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up."
   ]
  },
  ...
}


Best, Andrew

westurner commented 3 years ago

IMO, the fixed indentation reduces line noise in diffs. Avoiding line noise in diffs justifies setting the indentation level to a fixed indent.

If the/a JSON parser was extended to support sniffing the (first few?) indentation levels, I still don't think nbformat should support configurable indentation levels.

If you need to reindent a JSON document to e.g. manually review the ipynb, python -m json.tool --indent 2 nb.ipynb may be helpful. https://docs.python.org/3/library/json.html#module-json.tool

On Sun, Aug 29, 2021, 22:44 Andrew Darling @.***> wrote:

Hello,

Thanks for the awesome tool! I would like to use it for a project I'm working on but have the following issue:

NBFormat does not recognize indentation levels people set for their notebook's JSON file (more info: https://jupyter-notebook.readthedocs.io/en/latest/frontend_config.html#example-changing-the-notebook-s-default-indentation). This is a problem because when a notebook with more than one space for indentation is formatted by NBFormat, all lines are rewritten to use single spaced indentation. As a result, the git diff is unreadable.

I propose the following two possible solutions (either would work):

  • NBFormat could parse how many spaces are used in a file and default to using that many spaces when writing JSON back to that file
  • NBFormat could accept an indent option within the nbformat.write function allowing users to specify how many spaces should be used

I suspect the second possible solution would be easier and therefore the better option.

Problem Walkthrough Original nb JSON (note the double-space indent):

"cells": [ { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "locked": true, "schema_version": 3, "solution": false }, "id": "peZlh40I73Ql" }, "source": [ "# Deep Learning\n", "\n", "In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up." ] }, ... }

After a call to nbformat.write(nb=nb, fp=notebook_path, version=nbformat.NO_CONVERT), the following is the JSON output (note the single-space indent):

"cells": [ { "cell_type": "markdown", "metadata": { "id": "peZlh40I73Ql", "nbgrader": { "grade": false, "locked": true, "schema_version": 3, "solution": false } }, "source": [ "# Deep Learning\n", "\n", "In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up." ] }, ... }

Best, Andrew

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jupyter/nbformat/issues/228, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMNS5ZQ3JYHPPOYPA2PQ3T7LWADANCNFSM5DA5Z4KA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

z3ht commented 3 years ago

IMO, the fixed indentation reduces line noise in diffs. Avoiding line noise in diffs justifies setting the indentation level to a fixed indent.

Hmm... Now that you say this, I'm realizing just how much line noise two different developers with two different indentation settings could create. I concede that the better solution for my problem is to enforce a uniform indentation level across our project.

If the/a JSON parser was extended to support sniffing the (first few?) indentation levels, I still don't think nbformat should support configurable indentation levels.

Although possibly not necessary, I think parsing just the first line would work fine. I know this invites the argument that there may be a hidden line with different indention than the rest of the file that would cause line noise when formatted but I'm really not concerned with it.

If you need to reindent a JSON document to e.g. manually review the ipynb, python -m json.tool --indent 2 nb.ipynb may be helpful. https://docs.python.org/3/library/json.html#module-json.tool

Thanks for pointing this out! If we decide to go with an indentation level other than 1 and NBFormat does not add custom-indention support, I'll be sure to look into using this.

z3ht commented 3 years ago

I'm closing this Issue because I no longer feel I need an indentation feature

westurner commented 3 years ago

Rog. For my nbformat with JSON-LD @context designs, perhaps others would feel differently about 2, maybe 3, 4 spaces instead of tabs?

The notebook-aware diffing support in nbdime may be helpful for your use case?

From https://github.com/markusschanta/awesome-jupyter/issues/8 :

https://github.com/Yogayu/awesome-jupyterlab-extension#version-control