mbrukman / autogen

Automatically generate boilerplate license comments.
Apache License 2.0
98 stars 27 forks source link

Add support for ipynb file #57

Closed wangtz closed 5 years ago

wangtz commented 6 years ago

IPython notebook is a popular tool for machine learning developer/researcher. Would be great to add support for this.

Thanks,

mbrukman commented 6 years ago

@wangtz — this is an interesting question, because *.ipynb files are actually JSON files, so including a comment-style preamble as we do for other files will not work, because then it would no longer be valid JSON, since JSON disallows anything outside of { ... } top-level object).

So how would you expect us to embed a license into it? Do you have any sample *.ipynb files we can take a look at to see what the typical standard is for these files?

Maybe the metadata section would be a reasonable place to put it, perhaps with the license key?

Or maybe since it's JSON, we should follow the Node.js package style for specifying licenses? Though I think including the full license text is, in general, preferred by lawyers, so maybe we could have a license key with SPDX identifier, and license_txt for the full contents of the license?

What do you think?

wangtz commented 6 years ago

.ipynb file represents a series of Python code blocks. I'm suggesting to add license on top of the first code block so that it's visible to users. I don't think normal users would look into the JSON source code.

mbrukman commented 6 years ago

@wangtz — what do you think of this?

wangtz commented 6 years ago

Looks good. Thanks. :-)

I see that you added a text block instead of code block for these comments. I'm fine with either.

mbrukman commented 6 years ago

@wangtz — I tried adding just a code block with those comments earlier, but it didn't work, but that was probably because I didn't have the right metadata at the end of the file, though. Just changed it to use code and metadata, and now it appears to work. Take another look?

wangtz commented 6 years ago

Eh. I didn't find any related commit. Most recent one is adding .bat support. Do you mean a pull request?

mbrukman commented 6 years ago

@wangtz -- it's in a branch on this repository; you can see the preview using the same links as above (I modified a sample file in-place). If you want to see what it looks like in practice, see the ipynb-support in this repo, it's not a PR yet as I'm still iterating on it.

wangtz commented 6 years ago

Ah I see. I had a quick look at your ipynb.py but didn't quite understand the logic. Are you replacing the metadata section when a header is added.

I thought the metadata section is provided by the users (e.g. they may not be using PYthon3), and all we need to in this case is to prepend an element in cells.source.

What do you think?

mbrukman commented 6 years ago

@wangtz — without the metadata section, the notebook didn't load in the tools I tried it in, it just failed, so it wasn't usable. How would a user get to a usable state if the boilerplate we generate (without the metadata section) is insufficient to load into a notebook viewer?

I presume the metadata should be modifiable by users after the notebook is loaded.

Here's my sample file: https://github.com/mbrukman/autogen/blob/ipynb-support/tests/testdata/sample.ipynb

Do you want to try it out with your notebook environment, and then delete the metadata section and see if it still loads for you?

mbrukman commented 5 years ago

I've merged https://github.com/mbrukman/autogen/pull/66 to add support for Jupyter notebooks.