New language comment characters need to be registered in multiple files

eitsupi commented 1 year ago

(Comes from https://github.com/quarto-dev/quarto-cli/discussions/4092#discussioncomment-4950816)

Currently comment letters are managed in multiple files and the same modification must be made to each.

I registered for PRQL in #4147, but did not realize that I needed to register in the following files, so it appears that code annotations are not available for PRQL at this time.

https://github.com/quarto-dev/quarto-cli/blob/94a815fa510272ed55843cde80ad20cac2e6c2bc/src/resources/jupyter/notebook.py#L528-L571 https://github.com/quarto-dev/quarto-cli/blob/94a815fa510272ed55843cde80ad20cac2e6c2bc/src/core/jupyter/jupyter.ts#L1009-L1052 https://github.com/quarto-dev/quarto-cli/blob/94a815fa510272ed55843cde80ad20cac2e6c2bc/src/resources/filters/quarto-pre/code-annotation.lua#L6-L55 https://github.com/quarto-dev/quarto-cli/blob/94a815fa510272ed55843cde80ad20cac2e6c2bc/src/resources/rmd/hooks.R#L818-L863 https://github.com/quarto-dev/quarto-cli/blob/94a815fa510272ed55843cde80ad20cac2e6c2bc/src/core/lib/partition-cell-options.ts#L310-L354

It would be great if these codes were automatically updated from a single definition file and if there were written instructions for registering new languages.

cderv commented 1 year ago

To explain a bit more the context

Adding to jupyter related files will be necessary for language that are also supported there. Your engine is knitr only.
code-annotation.lua is indeed probably necessary for the feature to work on those chunks. But note that this is work the output language I think so it would be necessary to add elm not your engine, right ?
What is in hook.R is only used for older knitr versions, as now this lives in knitr
partition-cell-options.ts is indeed about parsing the YAML chunk option for YAML intelligence in the Qmd file.

We could definitely try to make more straightforward to add a missing language

eitsupi commented 1 year ago

Thank you for the detailed explanation!

Adding to jupyter related files will be necessary for language that are also supported there. Your engine is knitr only.

The comment characters are irrelevant whether it is on knitr or Jupyter, so I think it would not matter if all these files have the same content (even if the Jupyter kernel does not exist now, it may in the future).

code-annotation.lua is indeed probably necessary for the feature to work on those chunks. But note that this is work the output language I think so it would be necessary to add elm not your engine, right ?

Thanks for pointing this out!

cscheid commented 1 year ago

We love the principle of "single source of truth" in quarto, and we try hard to abide by it whenever it makes sense! (See our schemas, which drive validation, autocompletion and website documentation, for an example)

In this case, though, there's an additional complication, which is the presence of code cell "engines" that are handled in typescript code, but need to be made visible by the IDEs. Specifically, consider mermaid and dot engines, and their respective comment syntaxes. The engine definitions lives in typescript source code, but the rest of the comment definitions live separately. These two fundamentally cannot be a "single source of truth".

I agree we should make it easier to add new language definitions, but it can't just be through making a single array somewhere.

cscheid commented 1 year ago

@cderv this is part of the reason we have quarto build-js!

eitsupi commented 1 year ago

Would it be possible to use a structured definition file to manage the information in one place?

Something like:

{
  "somelang": {
    "commentChars": "#",
    "engines": {
      "jupyter": true,
      "knitr": true
    }
  },
  "mermaid": {
    "commentChars": "%%",
    "engines": {
      "jupyter": false,
      "knitr": false
    }
  }
}

cscheid commented 1 year ago

Not entirely, because part of the definition includes actual typescript code, so you're still not keeping all of the stuff together.

You can't get a description of behavior (actual functions in actual code which get execution) and a single static file at the same place.

cderv commented 1 year ago

this is part of the reason we have quarto build-js!

Yes I understand that. All those file mentioned here are not affected by quarto build-js, AFAIU

Here the list to keep in one place is probably the languages associated with their comment chars. This would be easy for every known language.

Then we have the specific cells engine which can be quite large considering knitr custom engine feature (like prql which will be used in source code but will produce elm language chunk for code-annotation to work). So more globally, is there a way for us to help support custom engine in R documents to benefit from YAML auto completion. Regarding detection of configuration, knitr uses #| as a default for any chunks to account for unknown engine and their specific comment chars.

Anyway, a bit larget than this specific issue.

cscheid commented 1 year ago

All those file mentioned here are not affected by quarto build-js, AFAIU

Some of them are.

https://github.com/quarto-dev/quarto-cli/blob/main/src/command/build-js/cmd.ts#L131 eventually gets to https://github.com/quarto-dev/quarto-cli/blob/main/src/core/schema/build-schema-file.ts#L65-L70, which calls languages() from src/handlers/base.ts, which collects information from mermaid and dot, including their comment syntax.

cderv commented 1 year ago

Good to know more how this works for internal engines like those two. I was referring to files mentioned in https://github.com/quarto-dev/quarto-cli/issues/4823#issue-1623476749

Anyway, not straightforward to come up with SSOT.

Thanks for the explanation

quarto-dev / quarto-cli

New language comment characters need to be registered in multiple files #4823