makew0rld / amfora

A fancy terminal browser for the Gemini protocol.
GNU General Public License v3.0
1.14k stars 65 forks source link

Syntax highlighting for preformatted text blocks that have alt text indicating a programming language #252

Closed mntn-xyz closed 2 years ago

mntn-xyz commented 3 years ago

From the Gemini spec:

"Anything which comes after the ``` characters of a line which toggles preformatted line on (i.e. the first, third, fifth, etc. toggling lines in a document) may be treated as "alt text" for the preformatted content. In general you should not count on this content being visible to the user but, for example, search engines may index it and screen readers may read it to users to help the user decide whether the preformatted content should be read aloud (which e.g. ASCII art generally should not be, but which source code perhaps should be). There are currently no established conventions on how alt text should be formatted."

Many Markdown implementations, for instance, use a similar mechanism to identify the language for syntax highlighting:

```html
<p>This is HTML</p>
```

It seems like it is inevitable to have some kind of syntax highlighting convention in Gemini, especially given its usage within tech-centric circles. Given that the spec leaves alt text wide open for interpretation by the client, I thought that it might be worth investigating this use case in amfora.

Hugo uses a fantastic Go library called Chroma for its syntax highlighting. Chroma explicitly supports highlighting with 8-bit and 256-bit terminal colors, so it should be a good match.

mntn-xyz commented 3 years ago

I have a patch almost ready for this, see https://github.com/mntn-xyz/amfora/tree/chroma. It can be tested on this page: gemini://mntn.xyz/posts/2021-08-22-hello-world/

There are two problems:

  1. On one very long preformatted text line, it seems like a couple of characters are missing their formatting. I'm not sure if this is a bug with chroma's terminal formatter or something with amfora.
  2. For some reason every preformatted block has an extra newline added after it if it uses syntax highlighting.

Anyway, I dropped this here for now in case someone else wants to take a look. I'll pick it up again soon.

mntn-xyz commented 3 years ago

Here's a screenshot too:

image

makew0rld commented 3 years ago

Thanks for creating this issue, it's an interesting idea. I do kind of hesitate to add things like this because they're not standardized, but also the meaning of an alt text that is just python or something is pretty clearly for syntax highlighting. I would be interested in seeing any discussion of this on the mailing list and real world use if you have any offhand.

On one very long preformatted text line, it seems like a couple of characters are missing their formatting. I'm not sure if this is a bug with chroma's terminal formatter or something with amfora.

This sounds like maybe an Amfora issue, but I'm not sure. You should come up with a simple gemtext test case that causes this, and see if you can replicate the test case when just feeding the plain text to chroma alone in a sample program, separate from Amfora.

For some reason every preformatted block has an extra newline added after it if it uses syntax highlighting.

Maybe something Chroma is adding? Or it might have to do with where in your code the syntax highlighting is happening.

Anyway, once you've made a PR I can take a closer look at code and make any suggestions needed. Thanks!

mntn-xyz commented 3 years ago

Thanks, I will start a conversation on the mailing list and see what people think. It does seem like an area where some standardization could be helpful, though I understand why there hasn't been any formal standard as of yet.

For real world use, I am currently using it on my blog, as support is now built into gmnhg (which converts Hugo sites to Gemini). In addition to technical blogs, I'd like to work on porting/mirroring technical documentation to Gemini, such as godocs.io and docs.rs. Gemini is a lovely "distraction free" environment compared to the web, and it would be a great help to have this documentation accessible from there rather than needing to open a web browser.

Hopefully I can put a PR together this weekend.

mntn-xyz commented 3 years ago

Reading though past discussions gave me a little extra clarity... when searching for a language hint, I will scan only the first word of the alt text, discarding everything after it. This allows the alt text to be even more descriptive, which is especially important for non-sighted users.

makew0rld commented 2 years ago

Closed by #263