jgm / djot

A light markup language
https://djot.net
MIT License
1.71k stars 43 forks source link

Inclusion of other djot documents #199

Open tmke8 opened 1 year ago

tmke8 commented 1 year ago

When writing large documents in latex, it's common to split off chapters into their own files and then include them in the main document with \input{filename} (or \include{filename}). It's also common to do this with plots. It would be nice if djot had something similar.

The syntax could for example look like this:

## Chapter 1

... chapters/chapter1.djot

or

## Chapter 1

<<< chapters/chapter1.djot

It might also make sense to pass on variables to the included document, but I'm less sure about that:

{key=value}
... abstract.djot

EDIT:

Or maybe a variation on the image inclusion syntax:

## Chapter 1

!{=djot}(chapters/chapter1.djot)
clbarnes commented 1 year ago

Also very helpful for tables, in my experience. Asciidoctor allows you to say "this is a TSV", then include a TSV file and have it rendered nicely - for djot I suppose that would best be left to a custom attribute and filter. This means you can generate/edit your data with some external tool better suited for working with such a format. Even generating djot-formatted tables externally and then just having an include would be helpful.

toastal commented 1 year ago

AsciiDoc has level offsets that are really great to use with include. This means the subdocument can start all over with <h1>. This makes authoring and reading easier outside as context of the main document isn't relevant.

bpj commented 1 year ago

AsciiDoc has level offsets that are really great to use with include. This means the subdocument can start all over with <h1>.

My Pandoc filter for inclusion can handle that. You do something like this:

``` {format=html plus_hlevel=2}
filename.html
``` 

After the filter has parsed the contents of the included file with Pandoc[^1] the filter walks the contents of the included document[^2] and adds the value of the plus_hlevel attribute on the code block to the level of all headings. The added value may of course be negative, but I have as far as I can remember never needed to subtract from the levels of headings in an included file. I also have a filter which only raises/lowers the level of all headings in a document which I have sometimes used to subtract.

[^1]: There is a function for that in Pandoc’s Lua API

[^2]: You wrap the content of the doc in a div object and use a method of that object to apply a filter to its content.

crlf0710 commented 1 year ago

Storing pathnames within text has the usual caveat: Under POSIX systems filenames are not guarantee'd to be valid utf-8 string, while under Windows systems unpaired wtf-16 surrogate code units will be met. Case insensitives and text normalizations are also features of various filesystems. If this feature is included, i'd like to see the rules of whether escaping or forbidding be explicitly chosen upright.

clbarnes commented 1 year ago

I think a note saying that non-UTF-8 filenames can't be addressed and that paths follow the rules of the filesystem the path is pointing at would be fine. Encourage the use of sensible names and filesystems rather than making everyone's lives harder to make things comfortable for those who don't.

toastal commented 1 year ago

making everyone's lives harder to make things comfortable for those who don't.

I can echo this.

Anecdote: I have a Shanling DAP whose OS, despite based on the Linux kernel where alternative are already there, only supports Windows file systems (FAT32, NTFS, exFAT) for some mind-boggling reason. I had to rename my entire music collection to get the device to work and in the process lost a lot of filename information because Windows file systems can't handle a lot characters. This was tedious and led to a lot of wasted time trying to debug unexpected errors and left me dissatisfied with my RAID mirror having to abide by another FS’s limitations.

dbready commented 1 year ago

Similarly, I would like to be able to include segments of non-djot documents.

My specific use case would be referencing source code. If I want to reference a function in an adjacent file, a mechanism to extract specified lines and render them inside a verbatim block. A typical approach to producing examples may iteratively reference small blocks of code throughout the document, and then display the complete file at the end.

Possibly out of scope for core, but wanted to mention it as this is something I am encountering in a project.

toastal commented 1 year ago

@dbready I can echo that too. Using AsciiDoc to reference code snippets from the code itself is super helpful and easier to keep in sync (just need to remember line numbers).

dbready commented 1 year ago

Keeping line numbers in sync is definitely a tricky point. My ideal interface would allow for the two workflows:

The first workflow is required to annotate real code which cannot be polluted with documentation markup. The second is more convenient for being able to maintain code block references without having to continually keep line numbers synchronized.

chrisjsewell commented 1 year ago

Heya, I would also point to https://docutils.sourceforge.io/docs/ref/doctree.html as well here (i.e. what restructuredtext does) Here, every node in the AST stores both a line number and source attribute. this allows for later post-processing warnings to point to the specific file and line

jgm commented 1 year ago

Yes, if we had built-in includes, implementations that store source positions would have to add a source name to the source position.

mcookly commented 1 year ago

If the [[...]] syntax is not used for wikilinks (#26), perhaps it could be used here. You could then specify line numbers/offsets using attribute syntax. For example:

[[source-code.abc]]{2 10 .colorized}

would read the file from line 2 through 10 and apply colorized to the included content.

Allowing attributes would also make built-in includes versatile enough for both prose and code, e.g. logseq-esque linking (#231).

Omikhleia commented 1 year ago

from line 2 through 10

What does it mean, in the general scope of transclusion, if the start or end line is is in the middle of some structure -- say, a div, if the included file is a Djot file?

clbarnes commented 1 year ago

if the start or end line is is in the middle of some structure

I guess inclusions would have to happen early, so it would be a raw text dump. If the inclusion had a triple backtick in it, that would be treated as the start of a fenced code block, even if in the source it was the end of a fenced code block. I don't think this is a problem, really: raw text inclusion is by far the simplest and most predictable thing to do, and if users end up with weird documents, that's no different to them writing weird documents manually.

vassudanagunta commented 1 year ago

The technical term for this is transclusion.

Though it's for Markdown, the arguments for and functionality supported by IA Writer's transclusion support in v4 added back late 2016 is worth looking at for ideas. They even wrote a spec for it with the hope for community adoption.

mcookly commented 1 year ago

What does it mean, in the general scope of transclusion, if the start or end line is is in the middle of some structure

I think the transcluded djot source should be isolated from the rest of the inheriting document, if possible -- essentially treating the transcluded source as its own structure.

I guess inclusions would have to happen early, so it would be a raw text dump.

If the transcluded content is its own djot structure, then perhaps it can be left up to the renderer whether or not the source is a raw text dump. This ambiguity could cause a lot of parsing confusion though. Borrowing from image syntax, you might be able to avoid this problem by using [[...]] for isolated transcluded content and ![[...]] for a raw text dump.

clbarnes commented 1 year ago

I suppose some processing may have to occur in the transcluded block for resolving paths - if a/main.djot transcludes a/b/c/inner.djot, which refers to an image using the local path ./img.jpeg, where does the document look for the image?

mcookly commented 1 year ago

where does the document look for the image?

If the AST nodes store which files they come from, then child nodes with relative paths can inherit from the parent node's path (I think). So a parser/renderer would look for the image in a/b/c/img.jpeg. How filepaths are resolved could be up to the renderer though, but I don't like this possibility since it diminishes djot's interoperability.