redpen-cc / redpen

RedPen is an open source proofreading tool to check if your technical documents meet the writing standard. RedPen supports various markup text formats (Markdown, Textile, AsciiDoc, Re:VIEW, reStructuredText and LaTeX).
https://redpen.cc
Apache License 2.0
563 stars 74 forks source link

Input format option for Jupyter Notebook #860

Open sfujiwara opened 5 years ago

sfujiwara commented 5 years ago

Do you have a plan to implement input format option for Jupyter Notebook? As far as I know, currently we can not use .ipynb file as an input for RedPen CLI: http://redpen.cc/docs/latest/index_ja.html#command-line-tool

sfujiwara commented 5 years ago

We are currently working on Japanese translation of TensorFlow official document (many of the documents are .ipynb files): https://github.com/tensorflow/docs/tree/master/site/ja

We are using RedPen to help reviewer. But, since RedPen does not support Jupyter Notebook, we have to convert it to markdown: https://github.com/tfug/proofreading

We are very happy if we can apply RedPen directly to Jupyter Notebook.

takahi-i commented 5 years ago

@sfujiwara Thank you very much for your interest in RedPen 🙏

As you think, currently RedPen does not support .ipynb format and therefore we need to convert the files to markdown to apply RedPen.

To support .ipynb, we first need to the detailed specification of the format. Creating the parser could be simple since as we see, .ipynbcontains the Markdown blocks which RedPen already support the parser.

sfujiwara commented 5 years ago

I think the parser need to extract "source" elements if "cell_type": "markdown". An example block we want to extract is below:

https://github.com/tensorflow/docs/blob/392123db72bb67706a9805764b1833cc945b4e18/site/ja/r2/tutorials/keras/basic_classification.ipynb?short_path=6fd7e60#L146-L150

The content of "source" is similar to markdown, but a little bit different from it.

takahi-i commented 5 years ago

Thank you very much for tell us the format of ipynb files. We understand that the ipynb is a simple json format and can handle source of elements which have cell_type:markdown`.