Closed AakashGfude closed 4 years ago
Hello @AakashGfude , thanks for reaching out! I'd be happy to help.
Tell me a bit more about the myst
format, and how you see the mapping to a Jupyter notebook: How do you represent a code cell in myst
? Is there a cell marker in myst
to identify two consecutive text cell? How do you plan to encode the cell metadata?
thanks @mwouts. We are working on putting together a more detailed spec that documents the mappings between myst
and ipynb
format. We have added cell
delimiters in myst
such as +++
(all subject to change at this stage). Once the spec is put together we will certainly share it with you for thoughts and comments.
Our aim is to get exact two-way representations with the vision we can swap between human readable (text based format) and the machine readable (notebook format) as a mirror format.
Sounds great! I am looking forward to reading that.
@mwouts while we are working on the myst
<-> ipynb
spec, I had a question regarding the workflow / architecture of jupytext
.
We are interested in the possibility of setting up the ipynb
/ text-based-format
conversion (for those that have lossless conversions) to have realtime updating. The reason for this is we have found when working on larger projects that maintain source files in text format
users typically want to build the notebooks for running the code blocks and inevitably edit the notebook and forget to transfer those edits to the text
files. Do you think a mirroring two-way communication between formats would be achievable? We are happy to work on this -- but wanted to check with you first on your thoughts on this.
@mmcky -- how would this differ from the current jupytext.TextFileContentsManager?
thanks @phaustin -- that's neat! I hadn't realised you can open the md
file directly through the Jupyter interface and it represents that as the ipynb
file on the fly using a context manager. I had thought jupytext
was mainly built around companion files ipynb
and md
and keeping them in sync through save actions.
I had assumed the workflow would be to open text-based
file in an editor and an ipynb
file in jupyter and use a file watcher to keep both in sync in real-time so you could edit in either location and each format updates. But opening the md file directly through Jupyter is a neat way to handle this issue as a save in Jupyter will alter the md.
The only confusing part to me is if you open an md
file -- it seems to create an ipynb
file by default. I would have thought if you open an md
file you just want the translation to ipynb
on the fly and keep md
as the source of truth. {Update: Oh I see -- that is just default behaviour in jupytext
with notebook autosave and pair with notebook enabled}
Hello! Yes I agree with @phaustin, a proxy for real time sync is implemented in the ContentsManager. You've seen how it works, right? When you save the document, all its representations are written on disk (e.g. ipynb
and md
when you use a paired notebook, or md
only if you opened a md
document with no pairing information), and when you reload the notebook, the inputs cells are taken from most recent text file, and joined with the outputs of the ipynb
file, if any, using the fonction combine_inputs_with_outputs
from combine.py
.
The only confusing part to me is if you open an
md
file -- it seems to create anipynb
file by default.
When you open a md
file with no pairing information in Jupyter, the content manager does return a document with type notebook (using jupytext.read
). However no ipynb
file is created.
Note that actually, I would be interested in going one step closer to real realtime updating. For me, the difference with the current behavior would be the following:
md
file on disk, the notebook is updated automatically, without having to reload it.ipynb
file. Thanks to this
a) sync is faster since we never read the ipynb
file on disk for this realtime sync - only the text files, which are way lighter
b) if the notebook is md
only, we don't lose the outputs when the notebook is updated (currently, when you reload a md
only notebook, outputs are lost)This real realtime sync is being discussed at #406, and will require a good understanding of the JS/TS part of Jupyter, together with a port of the combine_inputs_with_outputs
function to these languages.
Now if we come back to your initial question, how to extend Jupytext to another format, I suggest that you have a look at how the .Rmd
format is implemented, starting with formats.py
. That format derives from .md
, with a few changes in how the text files are parsed. Note that I am not particularly proud of the implementation - you may found exceptions based on the file extension here and there... - but at least it works! And I can also contribute a POC for the implementation of your format when you're done with the specs.
Also, the test framework is very important in Jupytext, since it can help you make sure that roundtrip really work. So, for your new myst
format,
a) You could add a series of test on simple notebooks - seek inspiration in e.g. test_read_simple_markdown.py
.
b) You coud duplicate these lines in test_mirror.py
and replace md
with myst
. This will test your new myst
format for the roundtrip on a series of challenging notebooks that we have collected over time.
thanks @mwouts your comments about the realtime
sync sound great. I will follow #406.
I suggest that you have a look at how the .Rmd format is implemented, starting with formats.py.
Thanks for the the guidance. That sounds like a good entry point and we would want to do something very similar. I suggest then we will work in a fork of jupytext
to get the myst
syntax working and once the spec has settled down we can upstream the new format.
roundtrip is really important to us. so thanks for the guidance on testing too. Super helpful.
cc: @AakashGfude
hey @mwouts 👋 didn't realize this thread was going on! Just FYI this is the project that we were emailing about a few weeks back! Let me know if I can help move the conversation forward!
Per your earlier questions about how the notebook structure would be represented in MyST - this is our latest thinking:
https://github.com/ExecutableBookProject/MyST-NB/issues/12#issue-567866971
We'd love to hear your thoughts on the proposal there!
By the way, I realise that, if you already have a two way converter myst <-> ipynb
, you can plug it directly into the jupytext.reads/writes
functions. An example of this is the pandoc
format, for which we simply call pandoc
:
If you decide to take that route, you may
if self.fmt.get('format_name') == 'myst':
in both jupytext.reads
and jupytext.writes
myst
format in formats.py
(and import the myst
format version number from your package)myst
format, as discussed above.This will provide the same functionality (i.e. support of myst
on the command line and in the contents manager), and may be easier to develop or maintain.
Hey there @mwouts - I wanted to introduce myself and a team that I am working with.
We are a group of academic researchers who are working on a tech stack to build open, reproducible documents with Python. We've set up a GitHub organization to host the projects that we're working on as a part of this project: https://github.com/ExecutableBookProject.
To better equip ourselves and the community in writing complex documents, we are also building a new markup text format, called myst :- https://github.com/ExecutableBookProject/myst , that basically tries to combine the extensibility and strong semantic markup properties of reStructuredText with some features of Markdown.
Now, to do a seamless conversion between
myst
andipynb
, we thought of extending your amazing tool to includemyst
format. And before we write any code, it would be great if you can give us any heads up or ideas/suggestions on doing this properly.Thanks again for this great project!