rossant / ipymd

Use the IPython notebook as an interactive Markdown editor
BSD 3-Clause "New" or "Revised" License
484 stars 42 forks source link

Cell metadata #38

Closed rossant closed 9 years ago

rossant commented 9 years ago
bollwyvl commented 9 years ago

How about adopting the jekyll front-matter approach, and use some embedded YAML? I'd love to see a text-forward notebook UI (editor) that maintained nbformat metadata... in fact, this would be far easier than the current metadata UI!

An object between === could denote the notebook metadata (which could appear anywhere) while an object between --- could denote cell metadata. Of course, other parsing would give a nice default value for the cell_type and, say, nbformat, so it would be more of an metadata.update(**parsed_yaml). This would keep the base document really lean and readable.

I haven't had a chance to dig into the code, but would love to take a whack at throwing this together.

---
slideshow:
  cell_type: slide
---

# A Slideshow

---
slideshow:
  cell_type: fragment
---
```python
    print("Brought to you by ipymd")
\```
===
name: a slideshow
===
rossant commented 9 years ago

it would be great to have metadata support with a relatively standard syntax

see also http://rmarkdown.rstudio.com/ and https://github.com/chronitis/ipyrmd

let me know if you're going to work on the code -- i might have to merge my pending PR first...

bollwyvl commented 9 years ago

No Intent To Implement yet! Just was reminded of this in another discussion. Will check out the other links!

On 06:22, Fri, May 29, 2015 Cyrille Rossant notifications@github.com wrote:

it would be great to have metadata support with a relatively standard syntax

see also http://rmarkdown.rstudio.com/ and https://github.com/chronitis/ipyrmd

let me know if you're going to work on the code -- i might have to merge my pending PR first...

— Reply to this email directly or view it on GitHub https://github.com/rossant/ipymd/issues/38#issuecomment-106766631.

rossant commented 9 years ago

Here are a few ideas:

===
ipymd.key1: value1
ipymd.key2:
    foo: bar
===
===
ipymd.aliases:
    - name: IMPORT
      replace:
        ipymd.import:
            name: $1
===

This means that --- IMPORT myfile.md will be replaced by:

  ---
  ipymd.import:
    name: myfile.md   
  ---
bollwyvl commented 9 years ago

Ahhhhh. Looking very promising!

aml will do . nested keys, but YAML won't... But i'd still stick with YAML. Whitespace nesting is probably good enough for most things, and one can fall back to JSON {} if you want it on one line.

I like the alias as a generalization/realization of the --- notation. JSON Patch might be up to the task, though it looks like the python implementation doesn't support inversion... yet! Indeed. Using the Patch might be better than re-inventing something new, even if it adds a dependency.

For roundtrip to work, you'd want to also remember that you used an alias, and its arguments...

--- SLIDE

secretly generates this metadata:

{"metadata": {
    "ipymd": {"aliased": {"SLIDE": []}}},
    "slideshow": {"slide_type": "slide"}
}}

the stack of changes for md -> ipnb sounds like

reversed for ipynb -> md:

Shortcut/args being space delimited suggests you just get one, which is probably fine... the case where i would want more shortcuts is for slides: could one import n cells, and update their structure, i.e. load some slides as subslides? What happens when your import hits another import?

--- SUBSLIDE IMPORT other.md
---

Could just accept that an alias is really a function, and use named params...

--- SUBSLIDE IMPORT(other.md)
---

...which would then leave space for some kind of query language to pull out cells (in this example, the filter would be wrapped in $.cells[<expr>]:

--- SLIDE
# Title

--- SLIDE
# Recap...

--- SUBSLIDE IMPORT(part1.md, `this`.length-1)

--- SLIDE
# New stuff...
...

-- SLIDE IMPORT(common.md, ?(`this`.metadata.id=contact))

Could the alias definition be more namespace-y? Can't have multiple aliases to the same thing, and if someone wants to overload it, it shouldn't require looking through names.

Are document-level aliases are possible? I don't know what they would be... but the metadata regime is totally different, and i wouldn't want to mix them.

Combining all of these ideas, it's reasonably compact and standards-compliant, and should provide enough room to grow:

===
ipymd:
  alias:
    cell: 
      SLIDE: [{op: add, path: /slideshow/slide_type, value: slide}]
      SKIP: [{op: add, path: /slideshow/slide_type, value: skip}]
      NOTES: [{op: add, path: /slideshow/slide_type, value: notes}]
      SUBSLIDE: [{op: add, path: /slideshow/slide_type, value: subslide}]
      FRAGMENT: [{op: add, path: /slideshow/slide_type, value: fragment}]
      IMPORT($path): [{op: add, path: /ipymd/import, value: $path}]
===

ping @tonyfast

rossant commented 9 years ago

But i'd still stick with YAML

agreed

Are document-level aliases are possible? I don't know what they would be... but the metadata regime is totally different, and i wouldn't want to mix them.

agreed

What happens when your import hits another import?

the import() alias function could implement recursive import, so there would be no recursivity at the level of the generic alias system

Questions:

bollwyvl commented 9 years ago

Hopefully we're not creating too much of a monster :)

Are document-level aliases possible?

Had a duh moment: the obvious use case is a kernel name:

=== KERNEL(python)

this would go out and grab the whole kernelspec, which nobody wants to type by hand.

  • Is it not too weird to mix JSON and YAML? (btw it looks like github renders YAML metadata in Markdown documents)

JSON is a strict subset of YAML :) YAML even inherits the duplicate keys problem that ijson intends to fix. Also, the gh rendering only works for the first chunk of meta, a la jekyll, and everything else thinks you're making a heading. gist thinks everything is a heading.

If you were to indent everything as code, the rendering is better on gist, but wouldn't work with the gh custom rendering. no big loss, i say.

If i was doing a lot of tricky meta editing, I might even choose to explicitly use ticks and declare yml for syntax highlighting and linting... either way, we are proposing some stuff fairly incompatible with editors: --- SOMEALIAS looks pretty ugly, and would certainly be ignored by gh.

We don't want to get into the syntax-highlighting-package-business if we don't have to.

  • The query language idea is interesting but we might want to leave it for later. At the very least we should have a modular architecture for cell metadata that would allow users to write their own custom behavior.

I suppose for reuse, if you want to reuse a little bit, you pull it out of the original file, and import it from both places. much cleaner. but someday...

  • Maybe lowercase is better for aliases...

sure, was just working off the showoff notation. I kinda like it, because it would be easier to search and not get false positives.

I love the JSON patch idea for aliases

hooray! i suppose it wouldn't be insane to actually set the scope of the thing to be the whole cell instead of just the meta... i can think of horrible, dirty things like template execution resulting in markdown, or, horror of horror, code (thanks, @tonyfast, for putting these thoughts in my head).

--- JINJA(data=http://foo.bar/data.csv, body_part="brain")
My {{ body_part }} just exploded.
{% for line in data %}
- {{ line.text }}
{% endfor %}

Secretly, this would stash the whole template in metadata... and this cell, much like imported cells, would not be editable... or rather, the generated text would be discarded on roundtrip.

alias_name(cell, _args, *_kwargs)

Yeah, that's obvious now that you mention it:

--- SUBSLIDE IMPORT(slides.md, moreslides.md)

I might go a bit more verbose, if you're thinking magic names:

def alias_cell_<name>(cell, *args, **kwargs):
def alias_nb_<name>(nb, *args, **kwargs):

i wonder, if you IMPORT, do you get the aliases, defined/imported too? this would make the config file... just another file ./aliases.md, that one could maintain next to a big stack of notebooks. perhaps that is the meaning of import if done at the notebook, as opposed to cell, level.

would then all notebook meta come along? this would make it hard to overload, for example, a title or a theme (once slides support that). Perhaps the aliases can be used on either the opening or closing ---/===, which would control the resolution order.

Great stuff brewing here that answers the mail on a lot of long-standing issues with the usability of the notebook format itself. My colleagues are certainly excited by notebooks with sane PRs, and even more so for reuse. Wearing my nbviewer hat, I think one would still have to publish .ipynbs to get these rendered... but who knows!

tonyfast commented 9 years ago

I am having a little difficulty parsing all of this. I am going to offer my two cents as far as the choice of markdown and yaml go.

Use YAML always if a user will be entering their own keys and values. There are fewer mistakes than JSON. Also, the widespread adoption of things like Jekyll clearing indicate the ability for anyone to write JSON as YAML.

As for Markdown, I do not think that language or tool specific markdown flavors scale for the future.

RMarkdown is great for R users and the proposed syntax above may be great for notebook users.

The Ipython notebook's future applications are rapidly growing. Many syntaxes, languages, and kernels can be used in the Ipython notebook. If the notebook is treated as syntax/language agnostic then a conversion to/from markdown should be too.

Github Flavored Markdown is a proven language agnostic text document, see all the readmes. Github uses Github Flavored Markdown for syntax highlighting, but as a text document GFM indicates that a block of text that is following has a specific syntax.

In this issue y'all mentioned slides, templates, kernels, and metadata. Each feature has a very different application in practice:

Very few users need all of these features.

I believe that GFM's success as a language agnostic document should guide any extension of the Ipython notebook. I have been tinkering with converting GFM to Ipython Notebooks. From a readme.md, each block of markdown and code is transformed into an appropriate notebook cell using some Java(Coffee)script to create a readme.ipynb. The fenced code block languages are passed as cell magics.

At this point, I believe anything in this issue can be described by GFM markdown.

This bl.ock shows some extensibility of the readme file where a Javascript Template tool passes YAML variables to the Markdown. One could imagine Reveal being used or the YAML being passed to the notebook metadata.

bollwyvl commented 9 years ago

@tonyfast Sorry to bring you in without more preamble, Thanks for the feedback: we needed some of those words based on the stuff you linked and our lengthy discussions on previous topics.

You should definitely give ipymd a spin with pip: it's at heart a ContentsManager which replaces the stock FileContentsManager. It uses markdown as its storage mechanism, so you can round trip from md <-> json, editing wherever is appropriate at the time.

I have been advocating for notebooks-as-presentations for some time, but the approach has some shortcomings. Slides are where users are knowlingly manipulating at least cell-level metadata. Thus, one of the key drivers here is making slide editing and management really, really approachable, in the style of showoff, ioslides, etc.

I think we're in violent agreement that (GF)Markdown is teachable, mostly because it is readable but partially because it is diffable and mergeable to an extent beyond HTML and JSON. Basically, I (and others) have found that a directory of ipynb is not a long-term format for maintaining a family of docs, say a course, or a recurring meeting, or even the documentation for a decent-sized project. A "field reconsitutable" text representation that fully supports all features of the notebook format is the primary goal with the added goal of being more human-centric.

It is In this vein of user-centricity, that this whole alias discussion started. I would rather maintain and train:

--- SLIDE

over

```yaml
slideshow:
  slide_type: "slide"

though maybe there is some other GFM-compatible way to do the former...  
- HTML comments:

  ``` markdown
  <!-- SLIDE -->

but each of these has its drawbacks.

Anyhow, to the management point: I really, really want to be able to reuse sanely-versionable content and not rely on hacking javascript. The IMPORT alias has tremendous potential, if we can figure it out properly :)

Kernel information and Metadata can be included in a code as YAML or JSON.

It becomes more out of control at the notebook level, even assuming we used the just-one-frontmatter approach. I want this:

--- KERNEL(python3)

vs

---
kernelspec: 
  display_name: "Python 3"
  language: "python"
  name: "python3"
language_info: 
  codemirror_mode: 
    name: "ipython"
    version: 3
  file_extension: ".py"
  mimetype: "text/x-python"
  name: "python"
  nbconvert_exporter: "python"
  pygments_lexer: "ipython3"
  version: "3.4.2"
---
  • Slides are triggered by a Javascript code fence

I don't think that the best long term solution: the source needs to be as far removed from the details of the presentation engine as possible. I suspect at some point another, more modular slide framework will show up that supports most of the same concepts as reveal (heck, throw prezi in, too) but has better modularity, and definitely more stable and flexible archival generation.

Further, if using IJavascript or ITorch, it would be difficult to determine the cell fenced blocks from the meta fenced blocks without introducing some other mime type... which would again defeat the editor's attempt to assist us.

  • Templates are already rendered as HTML and Javascript.

That jinja examples was of what a plugin to ipymd could do, given access to the whole cell and not just the metadata. Basically, what could you do with a cell/notebook-level preprocessor that could modify what you see in the Notebook?

tonyfast commented 9 years ago

Oh wow, I see where y'all are coming from now. Thanks for that description. I love the idea of the human-centric part. Largely, I have been focused on the ease of adoption and teaching, but you guys are at a much grander scale of interaction than I have been thinking. It'll take a bit to get on this level.

Regardless, the example syntaxes like

=== KERNEL(python) --- JINJA(data=http://foo.bar/data.csv, body_part="brain")

look mightly similar to explicit and global tags in YAML. I have never applied them before, but they look analogous to some the suggestions above.

bollwyvl commented 9 years ago

No worries! I didn't even think of all the extra stuff yaml just does. The argument for using something native to the spec is strong.

It looks the tags take uris which in python resolve to callables. Ignoring !, such that a user can still create their own, we could take !!. So a !!slide could call alias_cell_slide. If they wrap, or list, i.e


!!import somefile.md !!slide

other: data

And if it can do round trip, this would just about solve the issue, and convince me that we don't want single line --- aliases.

Also, i like the ---/... Start and end of a meta block... Much better for parsing. I wonder why the jekyll folk decided to use ---/---.

On 00:20, Tue, Jun 16, 2015 Tony Fast notifications@github.com wrote:

Oh wow, I see where y'all are coming from now. Thanks for that description. I love the idea of the human-centric part. Largely, I have been focused on the ease of adoption and teaching, but you guys are at a much grander scale of interaction than I have been thinking. It'll take a bit to get on this level.

Regardless, the example syntaxes like

=== KERNEL(python) --- JINJA(data=http://foo.bar/data.csv, body_part="brain")

look mightly similar to explicit and global tags in YAML http://www.yaml.org/spec/1.2/spec.html#id2761694. I have never applied them before, but they look analogous to some the suggestions above.

— Reply to this email directly or view it on GitHub https://github.com/rossant/ipymd/issues/38#issuecomment-112283584.

tonyfast commented 9 years ago

Do the YAML tags and Python types for Names, Classes, and Objects move you in the right direction?

There are two examples that look similar

!!python/object/new:module.Class [argument, ...]
!!python/object/apply:module.function [argument, ...]
bollwyvl commented 9 years ago

Heh, had a look at some of that stuff: ended up with this: http://nbviewer.ipython.org/gist/bollwyvl/1c5d5f1040515fb108e1

We DON'T want to use the python namespace, because it can Do Horrible Things. Registering to the application-specific ! namespace is fine for me, as long as we document it :)

Otherwise, aside from not fully understanding how to handle missing stuff in JSON Patch, it's certainly looking pretty good!

rossant commented 9 years ago

closed by #62