unidoc / unipdf

Golang PDF library for creating and processing PDF files (pure go)
https://unidoc.io
Other
2.46k stars 250 forks source link

[BUG] Accessibility tags missing on download #552

Open vikraman-alea-bah opened 2 months ago

vikraman-alea-bah commented 2 months ago

Description

Accessibility tags allow blind and low vision screen reader users the ability to access the information on PDFs. On download, the PDF needs to preserve the accessibility tags (and their order), the alternative text in images, and the title of the document. Without the tags a blind screen reader user cannot access the information in a human-readable way (if at all) on a digital PDF.

Bug:

A PDF with accessibility tags that is flattened and downloaded using unipdf loses some accessibility features such as accessibility tags and alternative text for images

.

Expected Behavior

Accessibility tags (in order) and alt text for images are preserved on download

How to test

You can test if a PDF has accessibility tags a few ways.

  1. Use the free tool PAC
  2. Use the adobe acrobat reader free version to see if PDF is tagged (image attached)
  3. Use adobe acrobat pro paid version and run accessibility checker (image attached)

Attachments

Attached you'll find:

adobe acrobat pro

adobe-acrobat-pro

adobe acrobat free

adobe-acrobat-reader

pdf with accessibility tags

accessible-pdf.pdf

pdf downloaded using unipdf (with lost accessibility tags)

output.pdf

Code

  pdfReader, err :=pdf.NewPdfReader(bytes.NewReader(data))
    if err != nil {
        return nil, err
    }
 
    acroForm := pdfReader.AcroForm
    if acroForm == nil {
        return nil, errors.New("no form data present in pdf template")
    }
 
    w := &pdfFieldWriter{}
    w.SetFields(acroForm.Fields)
    w.LoadFieldOptions()
    truPdf := pdfcore.PdfObjectBool(true)
    acroForm.NeedAppearances = &truPdf
 
    w.Write(FieldOptionTypes.ReservationNumber, issuance.ReservationNumber)
    w.Write(FieldOptionTypes.Comment, issuance.Comment)
 
    // this returns pdf with lost accessibility tags
    return pdfReader.ToBytes(), nil
github-actions[bot] commented 2 months ago

Welcome! Thanks for posting your first issue. The way things work here is that while customer issues are prioritized, other issues go into our backlog where they are assessed and fitted into the roadmap when suitable. If you need to get this done, consider buying a license which also enables you to use it in your commercial products. More information can be found on https://unidoc.io/

vikraman-alea-bah commented 2 months ago

We have a license

ipod4g commented 2 months ago

Dear @vikraman-alea-bah

Thank you for providing such a detailed report on the issue. We have investigated it thoroughly and will get back to you with a response as soon as possible.

Thank you for your patience and understanding.

3ace commented 1 month ago

Hi @vikraman-alea-bah

We have investigated this matter further and can confirm that UniPDF does not yet fully support accessibility features. However, we are committed to addressing this issue and have added it to our roadmap for future development. Our goal is to enhance the accessibility support for PDF documents.

Based on you attached document, we observed that some accessibility checks done in Adobe Acrobat are failing while others are passing. Nevertheless, we understand the importance of ensuring that every feature is properly implemented to generate documents with comprehensive accessibility support.

vikraman-alea-bah commented 1 month ago

Thanks for looking into it @3ace! The most important thing is the preservation of the accessibility tags in order. Without that, we can't say the PDF is accessible.

Do you have a public facing roadmap that I can share with my team? PDF accessibility has become a requirement for us and it would be great if we could follow it on the roadmap to get an idea of when to expect it.

3ace commented 1 month ago

@vikraman-alea-bah Continuing from our last update, we've tried to investigate what needs to be improved on to support generating accessibility documents.

We've started by fixing the form flattening process to preserve all accessibility information.

Currently, we are in the process of creating an internal roadmap and do not have a public-facing version available.

However, I can assure you that PDF accessibility is a priority for us.

3ace commented 4 days ago

Hi @vikraman-alea-bah we have made some updates that should helps to preserve an existing accessibility features when processing existing PDF document. Here is the accessibility reports generated by Adobe Acrobat before and after the implementation.

The fix should be available with our next UniPDF release version 3.61.0 that should be released next month.

vikraman-alea-bah commented 2 days ago

Thank you so much for this! I also checked it against adobe acrobat pro and all the accessibility tags were preserved and in order. Amazing work!!