Add recursive file inclusion

smnto commented 4 months ago

It would be nice to be able to include other djot documents inside a djot document. Even multiple nested inclusions.

The syntax might be: +[My poems](../my_poems.dj) or maybe even just like inclusion of images - ![My poems](../my_poems.dj) and the parser will check whether the file in question is an image and then generate <img> and if it is a text file - will replace the inclusion syntax by the content of the file and continue parsing...

Thank you!

smnto commented 4 months ago

The Alt text (in the example above - "My poems") or if not provided +[](../my_poems.dj) the file name - can be used as namespace to prepend before all kinds of identifiers (e.g. footnote ids) that should be unique, but might be the same by chance/mistake in different included files.

smnto commented 4 months ago

BTW if the syntax for links were [link text]<uri> and in short form - <uri>. File/image inclusion could, nicely, be +[Alt text]<uri/path/to/file> and in short: +<uri/path/to/file>. Laconic and consistent.

Zacharymk1213 commented 3 weeks ago

@jgm I have a PR which is almost fully done that would incorporate this feature. I'm getting hung up on one point. This feature would allow people to incorporate documents inside of footnotes as well (one could imagine how that can be very useful). Here's the rub: the pandoc standard doesn't allow the sort of Block elements that these documents have to be embedded inline.So nested footnotes don't work. It is however parsed correctly with the HTML parser.

So you can verify this and offer more detailed advice you can see the following below:

a minimal reproducible example extracted from a larger suite of test files (hence the names doc2 and doc3)
the AST tree these documents are producing (found in debug-ast.txt)
The pandoc JSON that is produced by it
The markdown (used for illustrative purposes) that gets generated when attempting to convert the pandoc into markdown using the proper pandoc application. The same error occurs with any file format (I've tried a few).
the HTML that results when trying to convert the tree to HTML directly so you can confirm that it works with HTML (output.html.txt, remove the .txt upon downloading that was added since Github wouldn't allow me to just upload an HTML file)

Doc2.dj

Here is some text with a footnote reference.[^a] And another one here.[^b]

[^a]: Footnote A content.
+[doc3](./doc3.dj)
[^b]: Footnote B content.

Doc3.dj

Example text with footnotes.[^x] More text with another footnote.[^y]

[^x]: Footnote X explanation.
[^y]: Footnote Y explanation.

Expected Result

Here is some text with a footnote reference.[^a] And another one here.[^b]

[^a]: Footnote A content.
Example text with footnotes.[^x] More text with another footnote.[^y]

[^x]: Footnote X explanation.
[^y]: Footnote Y explanation.

[^b]: Footnote B content.

debug-ast.txt pandoc.json output.md output.html.txt

I'm wondering how to go about this. A workaround I found in the process of trying to engineer my way out of this (not very successfully obviously) is to convert the AST to HTML (since, as mentioned, the HTML is rendered correctly) and then that HTML to Pandoc using Pandoc. I even found this module that allows me to do it without a local Pandoc installation https://www.npmjs.com/package/pandocjs (and this module did do the job when I tested it). I'm not sure if you want me to do that, though or if there's some other method you know of by which it could be achieved. Please let me know how you want me to implement this.

jgm commented 3 weeks ago

Well, I'm not sure I'd even want this feature, and if so, what the syntax should be.

But you've correctly identified a limitation of an approach that parses the included file into an AST and appends it to the AST of the including file. This won't work if the included file contains footnote definitions or, for that matter, link reference definitions.

One could decide to document and live with this limitation. But if one didn't want to do that, one would have to use an entirely different strategy. Instead of having the parser read the included document, parse it, and append the AST nodes, the parser could just read the raw text of the included document and add it to the input to be parsed. That's how pandoc handles includes in LaTeX and RST.

Zacharymk1213 commented 3 weeks ago

@jgm I just submitted #94 to address this. (Couldn't reopen 93 since I did a force push after squashing, now I know that for the future). There are some test cases that fail, what do you want me to do with them? Should I rewrite them or can they be ignored? As you can see from the attached files the parser works just fine.

jgm commented 3 weeks ago

Failing tests should never be ignored. If your commit changes output on tests that seemingly have nothing to do with it, it's important to figure out why.

Note: I am unlikely to accept this PR (at least in the near future). Before adding something like this to an implementation, I'd have to decide to add this syntax for inclusion to the djot syntax specification.

Zacharymk1213 commented 3 weeks ago

The feature has been rewritten and all tests now pass. I created a separate entry point that adds all file inclusions and then passes it to parse to try to keep things separate and easier to debug as well as this feature optional in case you're reluctant to add it more universally. I also wrote some unit tests and provided some files for them.

Given that all tests pass and, as mentioned, this feature is provided via a separate entry point for projects that want it while not being a necessary part of the standard would you be willing to merge?

This commit would be very useful for enabling the creation of structured documents. (Eg. Someone wants to have an index and insert each students name followed by their essay while keeping their essay separate allowing for separation of concerns.)

Zacharymk1213 commented 2 weeks ago

This commit enables the creation of structured documents of arbitrary complexity, contrary to the current state of affairs where only one-pagers are allowed. So now if one needs to write a book or a lengthy article all the content must be in one file. Working with big files is not convenient and may slow down/crash the editor besides being hard to read. File inclusion, implemented in #94, allows the author to subdivide the book/article into separate manageable chapters. This provides djot with the power similar to that of LaTeX and makes it positively distinct from all the Markdown-like tools. Additionally, it does not require changes to the syntax (which I remember being mentioned earlier) since the commit currently uses the same syntax as image inclusions and work whenever parse is called. It has been verified that images still work fine since before inserting the content it checks if it's an image and if so leaves it alone to be parsed by the image parser (this has been tested and works).

AvtechScientific commented 2 weeks ago

Sounds like a game-changer feature... I'm voting in its favor!

tbdalgaard commented 2 weeks ago

This feature sounds very very nice. I hope @jgm would reconsider to merge this very soon.Sendt fra min iPhone via Braille Sense U2/MBrailleDen 26. jun. 2024 kl. 22.04 skrev Avtech Scientific @.***>: Sounds like a game-changer feature... I'm voting in its favor!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

jgm commented 2 weeks ago

So now if one needs to write a book or a lengthy article all the content must be in one file.

This is not really true. I often process multi-file markdown documents using pandoc; you only need to specify the files in order on the command line and they will behave as if concatenated together. (I tend to do this by creating a simple Makefile target so I only need to type make to build; an added advantage is that make will track the dependencies and only rebuild if something has changed in one of the content files.)

Perhaps there are advantages to having first-class includes, but I thought it was worth mentioning that multi-file documents are hardly impossible without them...

AvtechScientific commented 2 weeks ago

Well... it's kind of a workaround...

You depend on an external tool,
Most people who use markdown/etc. do not know what make is,
This approach doesn't really work. Consider following example: What do you expect from concatenation of these 2 files:

File 1: Is this **bold

File 2: or not?**... hmm

Using make approach there will be no bold, however using real inclusion - there will be, as expected.

So, considering the fact that there is no need to introduce new syntax - will you merge this, @jgm ?

jgm commented 2 weeks ago

It is assigning a new meaning to an existing syntax. Since djot.js purports to be an implementation of the syntax described at jgm/djot, I'm not going to add anything to djot.js that is not described there. (As I indicated last week.) We could consider an inclusion syntax in the future, but the reference implementation should not diverge from the described syntax. You can open an issue there if you want to discuss the advantages of an inclusion syntax and possible syntaxes.

AvtechScientific commented 2 weeks ago

I'd say that the meaning of the ![alt](path) syntax remained the same. It's just inclusion. In both cases. What has changed - is the range of objects this syntax can be applied to. Till now it were just images. Now images and text files.

But indeed jgm/djot should be updated first and only after it should the implementation reflect the changes. And thank you, @jgm, for your great tools - pandoc, djot, etc!

Omikhleia commented 2 weeks ago

Honestly I don't understand this issue -- I'm already using ![...](...) in my own workflows to include things that are not necessarily images, and this doesn't even need some support from the djot reader -- It's a renderer feature. (And when one asks for embedding a file in another file, header shifting comes into the game, so I'd rather see some good language-level specification than a PR)

AvtechScientific commented 2 weeks ago

Embedding a file means just dumping the content of the included file into the including file. No manipulations with the content... Header shifting, if needed, should be done by the user. The point here is not be dependent on external tools. And see the example in the above mentioned issue of how rendering files separately doesn't work in this case...

On Sun, Jun 30, 2024 at 12:15 AM Omikhleia @.***> wrote:

Honestly I don't understand this issue -- I'm already using ! in my own workflows to include things that are not necessarily images, and this doesn't even need some support for the djot reader -- It's a renderer feature. (And when one asks for embedding a file in another file, header shifting comes into the game, so I'd rather see some good language-level specification than a PR)

— Reply to this email directly, view it on GitHub https://github.com/jgm/djot.js/issues/85#issuecomment-2198338154, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC6NCK3CVSV3GCZUHQQBX3LZJ4PY7AVCNFSM6AAAAABDPHRF26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGMZTQMJVGQ . You are receiving this because you commented.Message ID: <jgm/djot. @.***>

Omikhleia commented 2 weeks ago

Embedding a file means just dumping the content of the included file into the including file

It does? I would beg to disagree, "dumping" the content of a file into another lacks any semantics.

AvtechScientific commented 2 weeks ago

Embedding a file means just dumping the content of the included file into the including file

It does? I would beg to disagree, "dumping" the content of a file into another lacks any semantics.

That's exactly the meaning of inclusion. But you are entitled to your own distinct opinion.

jgm / djot.js

Add recursive file inclusion #85