jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.4k stars 3.37k forks source link

ePub v2 fails validation with ePubCheck in pandoc 3.1.12 - worked previously #9469

Closed shoesforindustry closed 8 months ago

shoesforindustry commented 8 months ago

Validating an ePub v2 with ePubCheck fails validation with the following errors in pandoc 3.1.12. The same validation on the same ePub worked in the previous version:

Validating using EPUB version 2.0.1 rules. ERROR(RSC-005): ./book.epub/EPUB/content.opf(2,108): Error while parsing file: attribute "xml:lang" not allowed here; expected attribute "id" or "unique-identifier"

These two are repeated 6 more times: ERROR(RSC-005): ./book.epub/EPUB/content.opf(27,36): Error while parsing file: element "meta" missing required attribute "content"

ERROR(RSC-005): ./book.epub/EPUB/content.opf(27,43): Error while parsing file: text not allowed here; expected the element end-tag

It seems maybe the accessibility and xml:lang stuff that was added in pandoc 3.1.12 version should not be included for epub2 as it won't validate?

Marcello1173214 commented 5 months ago

@shoesforindustry @jgm Hi, I'm not a developer but a mere mortal who creates EPUB 2. Please can you tell me the solution in a simple way? (xml:lang is not allowed in the opf file for epub 2 but if I remove it it gives me an error in ACE by Daisy). I can't understand from the coded solution you put. I put language in every html file with the Sigil plugin (Access-aide) but it doesn't solve. I'll leave an email if you want to discuss but a comment here is also fine. Thank you. spyro15294@gmail.com

shoesforindustry commented 5 months ago

Hi @Marcello1173214, I am not sure what you are asking. To put the language into the content.opf of an ePub2 it would be part of the opf metadata like <dc:language>en</dc:language>, is that what you were asking?

If using Panodc then see the ePub section for more information: https://pandoc.org/MANUAL.html#epubs

You could also add this language attribute using the metdata editor in Sigil.

I don't think you need the language in the HTML, but I guess it would not hurt, something like: <html lang="en">

If all is correct the epub will pass both ePubCheck and ACE validation.

Either Pandoc or Sigil should be able to create valid ePubs without editing the content.opf file :)

Hope this helps.

Marcello1173214 commented 5 months ago

@shoesforindustry Yes, it should be like this, the problem is that if I put the language only in the metadata (it-IT</dc:language>) it passes validation only with epubchecker, while in ACE it gives me this error "The language must be specified (xml:lang in OPF package) Ensures the OPF XML language is provided Add the missing OPF xml:lang attribute". Conversely, if I put this value in the opf, as ACE says, it gives me an error in epub checker: "xml:lang attribute is not allowed in the opf file". It therefore seems that there is a bug in ACE that doesn't take into account that the language attribute in epub 2 can only be put in the metadata.

shoesforindustry commented 5 months ago

@Marcello1173214 Odd as I have just tried one of my Epub2 files and it passes epubcheck and Ace? Can you publish one of your epubs that fails?

Marcello1173214 commented 5 months ago

Unfortunately, my client contractually prevents me from disclosing work files. Can you pass me one of yours instead? I would be grateful because from 2025 the accessibility requirements are mandatory and I don't know how to solve this ACE error.

Inviato da Outlook per Androidhttps://aka.ms/AAb9ysg


From: shoes for industry @.> Sent: Saturday, May 11, 2024 9:18:12 PM To: jgm/pandoc @.> Cc: Marcello1173214 @.>; Mention @.> Subject: Re: [jgm/pandoc] ePub v2 fails validation with ePubCheck in pandoc 3.1.12 - worked previously (Issue #9469)

@Marcello1173214https://github.com/Marcello1173214 Odd as I have just tried one of my Epub2 files and it passes epubcheck and Ace? Can you publish one of your epubs that fails?

— Reply to this email directly, view it on GitHubhttps://github.com/jgm/pandoc/issues/9469#issuecomment-2105996809, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BINNUXSKT6VVU4XOKYE4CATZBZVHJAVCNFSM6AAAAABDMPVQKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBVHE4TMOBQHE. You are receiving this because you were mentioned.Message ID: @.***>

shoesforindustry commented 5 months ago

I am unable to at present, but if you are producing epub2 books then I think ACE only checks against epub3? So you would have to produce an epub3?

Marcello1173214 commented 5 months ago

Are you sure that Ace checks only epub 3?

For now I'm not doing epub 3 to make ebooks compatible with older reading devices but it could actually be a solution to switch to epub 3 before 2025. Do you know if I will have problems with this change? Meaning: does the code change much? Will I need to take a refresher course?

Inviato da Outlook per Androidhttps://aka.ms/AAb9ysg


From: shoes for industry @.> Sent: Monday, May 13, 2024 12:30:48 PM To: jgm/pandoc @.> Cc: Marcello1173214 @.>; Mention @.> Subject: Re: [jgm/pandoc] ePub v2 fails validation with ePubCheck in pandoc 3.1.12 - worked previously (Issue #9469)

I am unable to at present, but if you are producing epub2 books then I think ACE only checks against epub3? So you would have to produce an epub3?

— Reply to this email directly, view it on GitHubhttps://github.com/jgm/pandoc/issues/9469#issuecomment-2107216371, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BINNUXWINNT73J4G4G4OXLDZCCI5RAVCNFSM6AAAAABDMPVQKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBXGIYTMMZXGE. You are receiving this because you were mentioned.Message ID: @.***>

shoesforindustry commented 5 months ago

I am pretty sure Ace only checks epub3 which has been around since 2011! I use pandoc which produces epub3 by default. I presume it is similar for Sigma?On 13 May 2024, at 11:43, Marcello1173214 @.***> wrote: Are you sure that Ace checks only epub 3?

For now I'm not doing epub 3 to make ebooks compatible with older reading devices but it could actually be a solution to switch to epub 3 before 2025. Do you know if I will have problems with this change? Meaning: does the code change much? Will I need to take a refresher course?

Inviato da Outlook per Androidhttps://aka.ms/AAb9ysg


From: shoes for industry @.***>

Sent: Monday, May 13, 2024 12:30:48 PM

To: jgm/pandoc @.***>

Cc: Marcello1173214 @.>; Mention @.>

Subject: Re: [jgm/pandoc] ePub v2 fails validation with ePubCheck in pandoc 3.1.12 - worked previously (Issue #9469)

I am unable to at present, but if you are producing epub2 books then I think ACE only checks against epub3? So you would have to produce an epub3?

Reply to this email directly, view it on GitHubhttps://github.com/jgm/pandoc/issues/9469#issuecomment-2107216371, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BINNUXWINNT73J4G4G4OXLDZCCI5RAVCNFSM6AAAAABDMPVQKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBXGIYTMMZXGE.

You are receiving this because you were mentioned.Message ID: @.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

Marcello1173214 commented 5 months ago

I am pretty sure Ace only checks epub3 which has been around since 2011! I use pandoc which produces epub3 by default. I presume it is similar for Sigma?

So that explains everything (I didn't know this as I had never read anything about it on the Daisy website). I use Sigil and in fact there is a plugin that allows you to convert from epub 2 to 3 although, working as a professional, I think I will learn to work directly with epub 3. Thank you for your help!