[Question/RFE] Integrate a custom build command with the extension

belonesox commented 7 months ago

Consider a use case: We have a custom documentation workflow; it generates HTML from Markdown.

It use lot of custom Pandoc filters, such as "pandoc-include", "gladtex", with some sophisticated SCons processing and building. All these things (filter, processing, embedding resources, custom templates) encapsulated in some md2html.

So, we run some md2html text.md.html to get HTML file from text.md.

Is it possible to somehow use it with your extension?

I try to find something like "Build Command" in the extension's settings, and it appears that codebraid.preview.pandoc.executable is somehow hardcoded for Pandoc, making it impossible to pass a non-Pandoc option instead.

Am I wrong? or is it somehow possible to integrate a custom build command with the extension?

gpoore commented 7 months ago

For this sort of situation, you could probably write a shell script to serve as a wrapper for the custom build process, and then redefine codebraid.preview.pandoc.executable to call the shell script instead of pandoc. You would need to have the shell script accept Pandoc command-line options, which it would then pass on to Pandoc when Pandoc is called at the appropriate step in the build process. You could probably just have the script accept arbitrary command-line options, and then pass those to Pandoc, since it will give an error if anything is incorrect. The Pandoc output would need to be passed to stdout.

belonesox commented 7 months ago

OK, I will try it now. ... Struggling with "Preview HTML has an unsupported format or is invalid".

belonesox commented 7 months ago

Got freezed "Updating Codebraid Preview..." with cat ./readme.md.html test.zip

gpoore commented 7 months ago

cat won't work to do simple testing, because it doesn't accept command-line arguments intended for Pandoc. For a minimum test, you would need a shell script that accepts arbitrary arguments, and then perhaps invokes cat. Currently, cat will be returning an error about unknown options.

belonesox commented 7 months ago

OK, script is dead simple, just change directory of readme.md.html test-preview-codebraid.zip

gpoore commented 7 months ago

When Pandoc is used, it reads from stdin, consumes all command-line arguments to process results, and then writes HTML to stdout. I think you need to add the reading from stdin stage.

belonesox commented 7 months ago

I see the problem — when you call pandoc, it looks it completely agnostic about real processed file.

So no Pandoc filters, which need to know about real MD-file paths, will work (pandoc-include, GladTeX, etc).

Maybe it's possible to pass the path to the source Markdown file to the "pandoc" executable in some way (env var, command line options, etc)?

gpoore commented 7 months ago

Pandoc receives the file contents via stdin, getting the current, edited state of the file. Using the actual file in the filesystem won't work, because it may not be up to date with any changes since the last save. Saving before updating the preview isn't very practical, since the goal is a live preview.

The build process operates with the file's directory as the working directory, so most filters won't have issues.

If for some reason a filter requires a complete path to a file instead of a working directory, there are technically ways to access this. When the preview passes the current file contents to Pandoc, it actually uses a custom Lua reader. This reader accepts the concatenated contents of (potentially) multiple files, with a prepended line of JSON that contains file names and lengths. Then the custom reader processes this data as if it were read from (potentially) multiple files in the filesystem.

A wrapper script for Pandoc could read this first line of JSON from the stdin data and then pass it on to Pandoc in some form such as environment variables. I don't think this data is accessible within Pandoc itself, except during the initial processing by the custom reader (which removes the data).

belonesox commented 7 months ago

Thank you, it works somehow for me, now I wish to make scroll sync working. For me, with Pandoc 2.9.2, codebraid scroll sync not working even with README.md from the extension. I try to debug why, looks panel.webview.postMessage sending messages, but something wrong on /scripts/codebraid-preview.js side.

Is it possible to somehow debug codebraid-preview.js with VS Code? (I have little experience on VS Code extension debugging...). Call addBreakpoints from Extension typescript?

Also I wish to add some hotkeys for "Back/Forward" browsing to this HTML WebView.

gpoore commented 7 months ago

I'd suggest upgrading to the latest Pandoc, version 3.1+. That is likely the source of scroll sync issues. Otherwise, it is possible to open dev tools for the webview when testing it in VS Code.

The last release added a refresh button to the webview. Back/forward buttons could probably be added in a similar manner. I will be refactoring the refresh button extensively at some point, to completely separate it's implementation from other CSS and JS that some users may wish to disable.

belonesox commented 7 months ago

Thank you! I managed to get scroll sync working with the main file using my custom build scripts. But I wish to discuss or synchronize a vision for the extension.

In my use case (complex IT-documentation, scientific reports, etc), there is a big tree of Markdown files with sophisticated includes, and I wish to make something like classic LaTeX workflow with SyncTeX (like LatexWorkshop works, for example).

That is

"Double-click" from Preview opens corresponding line in some source file,
Some "preview-sync" command from a source file scroll to the corresponding place in Preview (or may be in several previews).

It looks possible, because pandoc "sourcepos" extension provide information about source files, but it looks somehow orthogonal to you idea "No filenames/only stdin/stdout", and specifying structure of document in some external _codebraid_preview.yaml files.

Why not just use some standard way of including markdown/pandoc files (like "pandoc-include" or any other filter), because there can be VERY complex structure of the docs, with multilevel includes, that is different docs in same folder can reuse same markdown files, and it is very unefficient to duplicate all this stuff in some .yaml files.

gpoore commented 7 months ago

All of my documents with multiple files and includes are still in LaTeX, so I haven't yet tried to make the extension work with more complex document structures. Using the _codebraid_preview.yaml to specify multi-file documents (in the concatenated, not included, sense) is simply an easy way to use multiple files building on Pandoc's built-in functionality for multi-file documents, without bringing in filters for including external files. It's not intended to be a general solution for all cases.

Using stdin/stdout to communicate with Pandoc allows the preview to be updated without saving to disk.

It should be possible to support include filters. This would require a few changes.

It probably isn't possible to support arbitrary include filters. The include filter would need to perform the include in a way adds in sourcepos data, either using the actual sourcepos extension for CommonMark-based formats, or the Codebraid Preview emulation of sourcepos otherwise.
The current filter that processes sourcepos data into a form more easily used for scroll sync would probably need to be modified to handle nested sources, since it was designed for sequential sources.
Because file inclusion happens within filters, using stdin/stdout to avoid saving to disk will no longer work. There would need to be a new option to save automatically before updating the preview.
There is at least one area where things could get complicated. Currently, the _codebraid_preview.yaml specifies the document structure for multi-file documents. With includes, there is no longer information about how all documents are connected, so if you are editing a file with the preview open, how does the extension know which file to use as the main file to build the document and update the preview? I suppose it would be possible to reuse _codebraid_preview.yaml somehow for this. A nicer, but more complex, solution would be to have the include filter track the document structure, and then pass this back to the extension (for example, as a line of JSON data prepended to the preview HTML, that is stripped out by the extension before the preview is updated).

If you're aware of a good Lua-based filter for includes, or want to create one, I'd be happy to help with processing the sourcepos data into a form that works for scroll async and adding an option to save before previewing, and then bundling the filter with the extension. I have a lot of commitments at present, however, so I can't promise that progress on this would be fast. Because pandoc-include is implemented in Python, I am concerned that there would be significant additional complexity in trying to get it to work, but that might not be the case. The Pandoc Lua filters have become much more powerful in recent releases, so I have been trying to use Lua as much as possible when I need filters.

belonesox commented 6 months ago

Thanks for this extension and your comments and ideas!

I have studied the code, and realized that in my use case, where there is a huge tree of files that are included in each other, it is not even UI-possible to provide scroll-sync as in your approach with linear sequence on markdown files, and it is not very important to fight for maximum build speed (refresh preview for our complex hundred-page docs won't get faster than a second anyway, and there's no point in saving on output to stdout etc.)

Since alas, there are no standard pandoc fast inclusion filters:

yes, they'd better be pandoc core, in Haskell, but I dont know HS well, so may be somedays ...
Lua filters like include-files.lua are abandoned and have inconvenient syntax…

So I decided to start it all on the python pandoc-include filter for now, despite its questionable efficiency. I've roughly implemented sourcepos-attribute forwarding through this filter and am working on a separate vscode extension that works LaTeX-style (like Latex Workshop, for example), i.e. provides previews and sync-to-source and (in the future, forward-sync). It looks something like this:

https://github.com/gpoore/codebraid-preview-vscode/assets/1609739/71dd448b-6705-42db-9311-2584c2efd927

Well, this issue can be closed, thanks!

gpoore / codebraid-preview-vscode

[Question/RFE] Integrate a custom build command with the extension #26