Open krassowski opened 2 years ago
To expand on things folk have wanted to point at in a client-agnostic way:
x,y,w,h
in an imageOf particular note here is choosing something that can be made to work with the web annotation data model.
If these things are nbformat
-first, it will be more client-independent (once implemented) rather than specifying a concrete DOM model (though it would be much more possible to use URLs rather than any specific frontend thing).
This is definitely a JEP-level concern, but could certainly be demonstrated first in a Lab4/Notebook7/nbconvert compatible extension before going for something in core... previous efforts have foundered on trying to integrate too deeply and do too much.
To expand on things folk have wanted to point at in a client-agnostic way:
- a specific position within an embedded CSV
- a section of source code
- a line of a log file at a date
- a particular x,y,w,h in an image
- a location on an embedded GeoJSON map output
Maybe the position inside of output is out of scope for the fragment syntax specification; instead we could:
notebook.ipynb#nth-output-of-cell=my-cell,2
(select second output of cell with id my-cell
)notebook.ipynb?output-fragment="row=100"#nth-output-of-cell=my-cell,2
(scroll to row 100 of 2nd output of cell with ID my-cell
in notebook.ipynb) using existing syntax for fragment for given MIME typeThis way we avoid re-inventing the the syntax for specific data types. For source code we can use text/plain
(char=
and line=
), for images this is handled by Media Fragments URI (e.g. #xywh=160,120,320,240
). I don't know if there is a standard GeoJSON fragment syntax.
If these things are nbformat-first, it will be more client-independent (once implemented) rather than specifying a concrete DOM model (though it would be much more possible to use URLs rather than any specific frontend thing).
Some thoughts here:
id
attributes into HTML tags; this could be opt-in for Markdown and default for HTML)#nameddest=
which is the PDF way of referring to sections (RFC 3778) (and the exporters would need to add sections too).out of scope
Much like when cell ids became a thing (but not output ids), i feel like this would be a significant change to handle for implementations, and doing it piecemeal wound't be as much fun.
a standard GeoJSON fragment syntax.
I'd wager because there's not JSON fragment syntax. The closest is JSON pointer, but it's a hair underpowered, as it lacks the ability to do attribute lookups. This means a cell would have to be #/cells/0/outputs/1/#sub-selector
rather than something like #/cells/[id="abc1234"]/outputs/[some=thing]/#sub-selector
.
Inventing a new syntax would be very frustrating. But at the end of it, if a notebook-derived document can't refer back to the logical location within a (potentially nbconvert-mangled) document, I don't know if we've moved the state-of-the-art forward.
If we did pick from one of the many non-standard JSON reference mechanisms (jq
, jmespath
, etc) it would be important to pick something with broad implementation profile. Really the most powerful thing is XPath, but XML gives everyone the willies.
I though that GeoJSON users would be interested in pointing to a specific position on map, not to a node in JSON? That wold be something like lat=a,long=b
, right? Sorry, if it sounds silly, I don't work with geospatial data.
Summarising what I see so far:
ipynb
cell=
prefix, we can still later decide to add xpath=
prefix later on (if for some reason we decide to go the XML way)For what it is worth, I implemented fragment id's for all of the editors in CoCalc recently. The format I used for our Jupyter notebooks is
#id=some-cell-id
That's it. Thanks for thinking through a format for more refined information! I'll attempt to follow what you do for any extensions, rather than inventing something new (except I'm sticking with #id rather than #cell-id).
#attr1=a&attr2=b
#!p/at/h#attr1=a&attr2=b
#!p/at/h\#attr1=a&attr2=b
executablebooks/meta/discussions/102: "Help compare Comment and Annotation services: moderation, spam, notifications, configurability"
https://github.com/executablebooks/meta/discussions/102
https://github.com/executablebooks/sphinx-comments:
Add comments and annotation functionality to your Sphinx website.
Currently, these commenting engines are supported:
- Hypothes.is provides a web overlay that allows you to annotate and comment collaboratively.
- utteranc.es is a web commenting system that uses GitHub Issues to store and manage comments.
dokie.li
is an open source commenting and annotation overlay built on web standards.
@westurner thank you for pinging interested parties and very useful links. Do you think that advancing with cell-id=
takes as closer to the larger goal?
One slightly-technical question to all: if we go forward with cell-id=
, should the base nbconvert template produce id="cell-id=some-unique-id"
(as in current draft of https://github.com/jupyter/nbconvert/pull/1897), or should it include id="some-unique-id"
and a blob of JavaScript which would manually scroll to the relevant fragment?
@gwincr11 this might be of your interest too.
@krassowski thanks for bringing this to my attention. I have been looking at a similar issue, with mapping content to a Python notebook. Cell id is useful, I am curious if a more granular approach would be helpful though. One thing that is super helpful in the GitHub ui is linking directly to a block of code, this is granluar down to the line being discussed. I would love a tool that allowed for adding into the notebook structure easily across platforms without the content needing to be in the notebook json structure.
It maybe interesting to consider something like a Javascript source map, this would give very granular access to the line level potentially or even character. There are a number of json mapping tools in the python echo system, here is a stack overflow discussing this very idea. https://stackoverflow.com/questions/55684780/get-line-number-while-parsing-a-json-file
My thinking around this is that it maybe nice to tie content to the rich text view of the notebook and makes it portable with the notebook, without needing to be part of the structure, since a consumer can map features into the notebooks structure at render time. For example you could create a commenting system that worked with any third party tooling, GitHub, GitLab etc and since the comments could map to the underlying json structure you could bring PR review comments into any plugin you wanted.
My thinking around this is that it maybe nice to tie content to the rich text view of the notebook and makes it portable with the notebook, without needing to be part of the structure, since a consumer can map features into the notebooks structure at render time. For example you could create a commenting system that worked with any third party tooling, GitHub, GitLab etc and since the comments could map to the underlying json structure you could bring PR review comments into any plugin you wanted.
Web annotations may also be a good way to accomplish this... I am wondering how portable it maybe to other plugins? I do like that is it an open standard though 😄
I have a platform that uses URLs to embed code: https://docs.metapage.io/docs
For a lot of the components (that are simply URLs/websites), I use the hash part of the URL, but re-use the query param format:
http://<origin>
/<path>
?key=val
#
<hashfragement>
?hashkey2=val
This way the hash parts can be very big without sending all that code to the server. The important bit is that it contains a hash fragment and hash query params. Whatever schema is decided here, it would be great if I could keep my hash key=val
pattern within the schema, since I would like to add browser-only jupyter notebooks, as a pure URL defined notebook is a really useful pattern.
It is currently possible to link directly to specific markdown heading in notebook or to any HTML element with
id
property via the use of URL fragment identifier to automatically scroll to such a heading or element (e.g. via destination anchor).There is currently no way to scroll to other elements of the notebook. https://github.com/jupyter/nbconvert/issues/1862 and https://github.com/executablebooks/jupyter-book/issues/1812 proposed to allow linking to specific cells in the Jupyter notebooks. Elements which we may want to link to are:
Fragment Identification Syntax is a way of defining how to recognise which element is referred to in the Fragment Identifiers. Adopting one for Jupyter notebooks will:
Formats with Fragment Identification Syntax include:
a.csv#row=1
, range of rowsa.csv#row=5-7
, use wildcard for last rowmyacsv#row=5-*
,a.csv#column=2
(and ranges, as above), ora.csv#cell=1,2
(as above),a.txt#line=1
, range of linesa.txt#line=10,20
,a.txt#char=100
(or range, as above)a.txt#*alias
For more examples see Wikipedia: URI fragment.
The proposed way forward is to:
cell=my-cell-id
orcell-id=my-cell-id
)This proposal result in a very limited backward incompatibility, this is a heading with
cell=
prefix would now be resolved to a cell instead of such a heading. We could supportheading=
target to allow to disambiguate.Questions:
nth-cell=3
ornth-code-cell=2
?Of note, equals sign is allowed in identifiers in HTML 5 but not in HTML 4; we could consider using a colon instead (
cell:my-cell
, but using an equals sign seems in line with other formats). Spaces are always forbidden in identifiers.