quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.81k stars 309 forks source link

`fig-alt` removed by Quarto at least for `docx` (works with Pandoc) #5514

Open mcanouil opened 1 year ago

mcanouil commented 1 year ago

Edited: based on @cderv comment and with additional examples

Considering the below example for Quarto fig-alt attribute:

## Section

![Caption for Quarto Logo](https://quarto.org/quarto.png){fig-alt="Quarto Logo"}
quarto pandoc index.qmd --from markdown --to docx -o index.docx quarto render index.qmd --to docx
image image

Using Pandoc markup, caption + alt text ends up in Word alt text (so no real issues):

## Section

![Caption for Quarto Logo](https://quarto.org/quarto.png "Quarto Logo")
quarto pandoc index.qmd --from markdown --to docx -o index.docx quarto render index.qmd --to docx
image image

Using computations:

## Section

```{r}
#| label: fig-plot
#| fig-cap: "A caption for a plot"
#| fig-alt: "An alt text for a plot"
plot(1)

|`quarto pandoc index.qmd --from markdown --to docx -o index.docx` | `quarto render index.qmd --to docx`|
|:-:|:-:|
| Obviously not working | <img width="1624" alt="image" src="https://github.com/quarto-dev/quarto-cli/assets/8896044/4f50db00-abbc-4dfc-8aa4-d09b4344cda6"> |

Note that computation with figures will generate the following markdown (for both `knitr`/`Jupyter`):

```md
![A caption for a plot](index_files/figure-docx/fig-plot-output-1.png){#fig-plot fig-alt='An alt text for a plot'}

The initial question also mentioned pdf, but I don't even know how to show alternative text in PDF.


Discussed in https://github.com/quarto-dev/quarto-cli/discussions/5511

Originally posted by **IZE85** May 12, 2023 In MS Word, one is able to "add alternative text to a shape, picture, chart, SmartArt graphic, or other object to help people with visual disabilities." When I add alternative text to a figure via "fig-alt" it gets exported via HTML. Unfortunately, when knitting to formats like Word or PDF, my fig-alt text is saved/deposited somewhere, if I'm correct. At least, for MS Word, I cannot see the alternative text in the "alternative text" field, MS Word provides for figures. My question: Is it possible for the fig-alt text to find its way into the MS Word Alternative text field and have something similar for PDFs? Thank you in advance! Ad alternative text: "Alt text helps people with visual disabilities understand pictures and other graphical content. When someone using a screen reader comes across a picture in a document, they will hear the alt text describing the picture; without alt text, they will only know they've reached a picture without knowing what the picture shows." Source: https://support.microsoft.com/en-us/office/add-alternative-text-to-a-shape-picture-chart-smartart-graphic-or-other-object-44989b2a-903c-4d9a-b742-6a75b451c669
cderv commented 1 year ago

The alt text syntax with pandoc should be the one described in https://pandoc.org/MANUAL.html#images

I don't know about this alt-text attributes. In your screenshot, the caption text is just reused by Pandoc as an alt text.

Note that this works with Docx too

---
title: doc
format: docx
---

## Section

![Caption for Quarto Logo](https://quarto.org/quarto.png "Some alt text")

With Quarto we support fig-alt attributes, https://quarto.org/docs/authoring/figures.html#alt-text but it seems we are not indeed setting it for docx output the way that Pandoc's handle it.

This would be what to fix IMO.

cderv commented 1 year ago

This would be what to fix IMO.

We just don't support it in fact - we only support fig-alt for HTML output Feature would be to add in https://github.com/quarto-dev/quarto-cli/blob/59a865f351f92431eca8454124afe4c345517c2f/src/resources/filters/quarto-pre/figures.lua#L42-L50

So feature would need to be added there to leverage how Pandoc handles title text to alt text for Docx

mcanouil commented 1 year ago

I don't know about this alt-text attributes. In your screenshort, the caption text is just reused by Pandoc as an alt text.

True, I tried a markup from elsewhere and Quarto attribute feature, after trying the Pandoc one. Agreed, that the Quarto attribute feature is the thing to fix/add since it is used also in code cell generated figures. Anyway, Quarto does not use the caption when no alt text is provided while Pandoc does, which actually ensure a minimal accessibility level.

PS: I edited the original post to include the Pandoc markup syntax, and the markdown/output when using computations.

mcanouil commented 1 year ago

Since computation in Quarto, uses fig-alt, it feels like a bug more than a missing feature, but both are fine by me.

cderv commented 1 year ago

Notes from quick look into this :

So we need some clever processing in the right order to be able to do the right things - or output directly the openxml instead of Pandoc.

DDGerasimova commented 11 months ago

Hi, I am wondering if there are any updates on fig-alt for docx and pdf?

cderv commented 11 months ago

Not yet. Issue will be updated when there is some. You should be able to subscribe to the issue to get notification - no need to reply and ask.

DDGerasimova commented 10 months ago

Hi -

I am not sure if I should open a new issue, but it's related to alt text, so I thought it might fit here. The issue is about the reading order when there are pictures with alt text. If one has a title and a chart, then the reading order goes 1-chart alt text, then 2-title. However, if a title is not formatted as a title but as general text, then the reading order is correct (1-title, 2-chart alt text). [I checked for the reading order in Adobe (the accessibility tool) on a pdf file created via the 'print as pdf' option of an html file].

mcanouil commented 10 months ago

@DDGerasimova Could you share a small self-contained "working" (reproducible) example to work with, i.e., a complete Quarto document or a Git repository? Thanks.

You can share a Quarto document using the following syntax, i.e., using more backticks than you have in your document (usually four ````).

````qmd
---
title: "Reproducible Quarto Document"
format: html
---

This is a reproducible Quarto document using `format: html`.
It is written in Markdown and contains embedded R code.
When you run the code, it will produce a plot.

```{r}
plot(cars)

The end.

DDGerasimova commented 10 months ago

@mcanouil, sure, I think this example below should work? Apologies that it's not very pretty, new to Quarto -- still learning :)

The example is for the title being formatted as a title; so, to run with the title as text, I just removed #.

---
 fontsize: 20pt
---

# **My test page**

```{r, include = FALSE}
library(palmerpenguins)
library(tidyverse)

A <- c(3, 4, 8, 10, 13)
B <- c(1, 2, 3, 4, 5)

df <- data.frame(A, B)
#| echo: false

library(ggplot2)

spline_A <- as.data.frame(spline(df$B, df$A))

ggplot(df) + 
  geom_point(aes(x = B, y = A), size = 2, colour = 'black') + 
  geom_line(data = spline_A, aes(x = x, y = y), colour = 'black') +
  labs(y= "A", x = "B") + theme(text = element_text(size=14))
mcanouil commented 10 months ago

Ok, so the same issue as the one described here at least partially. Although I don't understand the "order issue" you are talking about (I am not sure converting to PDF is robust for the "reading order", it should be possible to check for this in Word itself). Edit: note that Pandoc uses an old version of Microsoft Word for its template. For instance, the check accessibility feature does not work well in it.

image

For a title you need to use title. # denotes a section.

Side note, @DDGerasimova don't mix inline code cell option and YAML style code cell option. Quarto uses and recommends YAML. For instance, your example stripped of unnecessary code:

---
title: "My test page"
format: docx
---

```{r}
#| include: false
library(palmerpenguins)
#| include: false
library(tidyverse)

A <- c(3, 4, 8, 10, 13)
B <- c(1, 2, 3, 4, 5)

df <- data.frame(A, B)
#| echo: false
#| fig-alt: "Line chart illustrating the increase in A, as B increases from 1 to 5"
library(ggplot2)
A <- c(3, 4, 8, 10, 13)
B <- c(1, 2, 3, 4, 5)
df <- data.frame(A, B)
spline_A <- as.data.frame(spline(df$B, df$A))

ggplot(df) + 
  geom_point(aes(x = B, y = A), size = 2, colour = 'black') + 
  geom_line(data = spline_A, aes(x = x, y = y), colour = 'black') +
  labs(y= "A", x = "B") + theme(text = element_text(size=14))
DDGerasimova commented 10 months ago

I guess I meant to say 'heading' rather than 'title' -- sorry! Although with 'Title' as in your code the result is the same as with 'section' (#). So to illustrate what I meant regarding reading order: (The numbers 1 and 2 at the beginning of the heading and of the chart denote the order).

(1) Examining reading order for the heading as regular text:

image

(2) Examining reading order for the heading as section or title:

image

(As a reminder: these were generated as html, then printed to pdf)

DDGerasimova commented 10 months ago

Regarding mixing inline code cell option and YAML style code cell option... So, my actual code is a bit more complicated than what I originally posted (please see below). My alt text includes references to the data so that actual numbers can be automatically included in the alt text. I tried to write it YAML style first but couldn't get it to work. I tried inline, and it worked. Is there a way to do this YAML style?

---
title: "My test page"
---

```{r}
#| include: false
library(palmerpenguins)
#| include: false
library(tidyverse)

A <- c(3, 4, 8, 10, 13)
B <- c(1, 2, 3, 4, 5)

df <- data.frame(A, B)

d_start <- df[df$B == '1', ]
d_end <- df[df$B == '5', ]
#| echo: false

library(ggplot2)

spline_A <- as.data.frame(spline(df$B, df$A))

ggplot(df) + 
  geom_point(aes(x = B, y = A), size = 2, colour = 'black') + 
  geom_line(data = spline_A, aes(x = x, y = y), colour = 'black') +
  labs(y= "A", x = "B") + theme(text = element_text(size=14))
mcanouil commented 10 months ago

Still don't mix. If you use inline for a code cell, use inline. If you use YAML, use only YAML. Because at some point when mixing you'll very likely set the option twice and then wonder why your changes do not show up because only one of the option is actually used.

knitr syntax (it's not Quarto syntax) is:

#| fig-cap: !expr "1+1"

See https://quarto.org/docs/computations/r.html#chunk-options.

DDGerasimova commented 10 months ago

Yeah, I understand that, just can't get it to work with YAML :( When my alt text has a mix of words and code for pulling numbers from the data, I either can't get Quarto to run at all or it runs but the code for pulling numbers displays the code itself in the alt text and not the actual number...

cderv commented 9 months ago

Adding correct support for alt text in Quarto for docx output is not that straightfoward. I moved this to 1.5

ARMurray commented 4 months ago

I'd like to add powerpoint (.pptx) as a needed component of this fix. I assume that support for alt text to word will extend to powerpoint but it has not been mentioned so far.

cscheid commented 4 months ago

@ARMurray they're unfortunately not related because the code paths for Pandoc's docx and pptx writers are quite distinct. We'd love to be able to do that, but we have other things higher in our priority list.