XHTML is not valid - Githubissues

bmix commented 6 years ago

While the export happens in the XHTML namespace, the output is not valid! Since XHTML is XML, all attribute values must be quoted and all elements must be closed.

So, instead of:

<link type="text/css" rel="stylesheet" href="custom.css">
<div class=Card>...<div>

it should read:

<link type="text/css" rel="stylesheet" href="custom.css" />
<div class="Card">...</div>

This should be dead easy to fix. The alternative would be to have to output be HTML4 or HTML5, which, however, would be a shame, since there is not yet any Anki2XML export I found, and this is the closest to it. One can simply write an XSL-T, which would take the XHTML (it's XML!) and transform it to whatever format one wants.

bmix commented 6 years ago

As an addition, it seems, the conversion results in lots of HTML elements with the same id. The id attribute must be unique within the document. It may be better to use the class attribute.

peterborkuti commented 6 years ago

Hi @bmix ,

Could you please test the new version with download from here (github)? It is enough to drop the py file into Anki's add-ons folder (can be opened from Anki -> Tools -> Add ons -> Open add ons older) and comment here your findings?

Thank you in advance Péter

bmix commented 6 years ago

Hello @peterborkuti , I find the following issues with the current revision:

title is 'Unititled Document' (which is just cosmetic, just wanted to mention it)
<link type="text/css" rel="stylesheet" href="custom.css"> is not closed. It must be <link type="text/css" rel="stylesheet" href="custom.css"/>
only a cosmetic issue: The text contents in one of my simple decks are placed within a div without a surrounding p or pre element.
on a more complex file I get:

Attribute value "container" of type ID must be unique within the document.

It may be better to use a class attribute here (as long you don't want to calculate unique IDs)

There is a mismatch between the SYSTEM and PUBLIC doctype.

The System Identifier http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd declares the XHTML 1.0 Transitional document type, but the associated Public Identifier -/W3C//DTD XHTML 1.0 Transitional//EN does not match this document type.
there are many places, element-tags do not get closed. They are mainly container elements for text, like ´\
`
upon validation I also get this error message:

document type does not allow element "pre" here; missing one of "object", "applet", "map", "iframe", "button", "ins", "del" start-tag

which I do not really understand, since the <pre> element should be allowed at that place, according to my knowledge. It may be, that the XML (since we produce XHTML we deal with XML) gets mixed up here with the deck's contents. The reason is, that my input file is a card deck, that teaches XML and therefore has a lot of source code examples, that seem to get transported uncleanly (sometimes special chars lile < are escaped, sometimes the tags are written literarily. That may be an Anki issue, however.

I wanted to attache the two decks I used for this report (combined in a ZIP archive) but for some reason, Github does not accept the archive. If you want, I will make them available via other means.

It seems, that it may become a major task to create a clean XHTML export solution. Eventually, if you'd only go for HTML5, it may be more simple, but I am not sure. My interest in your project was getting an XML (since XHTML is XML) export of the Anki deck, but it may be more interesting to express the whole Anki deck dataset in its own XML dialcet, something I started on defining, but had to postpone, because of other projects I am working on.

peterborkuti commented 6 years ago

Hi @bmix ,

A fixed the link closing issue and changed html doctype to html5. In my export there is no id attribute at all. I try to find info about what deck to export so I could make changing the document title, but I did not find any information nor have I any idea.

Originally this is not my project, I got an existing anki desk to html converter plugin and I modified that source according to my needs.

Now I am checked the source, and I understood that it uses the default anki desk to txt converter's output and it makes some search/replace in its output + adds some opening/closing html to the whole file and some styles.

If I would like to adapt it to your needs, I would have to understand how txt converter works and/or how Anki stores desk, etc, but now I am not using Anki, so to update this is not on my priority list, but feel free to fork it and modify it and/or send me pull requests.

Péter

peterborkuti commented 6 years ago

Hi @bmix ,

I found an easy way how to make ids: I changed every id to a randomized string. Hope it helps. Could you try it?

Thank you in advance Péter

bmix commented 6 years ago

Now I get an empty file and this:

Traceback (most recent call last):
  File "aqt\exporting.py", line 116, in accept
  File "anki\exporting.py", line 19, in exportInto
  File "C:\Users\bmix\AppData\Roaming\Anki2\addons\Export_html_glossary.py", line 140, in doExport
    out += '<div class="Question">\n' + esc(c.q()) + "\n</div>\n"
  File "C:\Users\bmix\AppData\Roaming\Anki2\addons\Export_html_glossary.py", line 120, in esc
    return convertSound(self.escapeText(randomizeId(s)))
  File "C:\Users\bmix\AppData\Roaming\Anki2\addons\Export_html_glossary.py", line 131, in randomizeId
    return re.sub(r' +id *= *[\'"]*([^ \'">]+)[\'"]*', getRandomId, s, 0, re.IGNORECASE)
  File "re.py", line 151, in sub
  File "C:\Users\bmix\AppData\Roaming\Anki2\addons\Export_html_glossary.py", line 128, in getRandomId
    return ' id="' + ''.join([random.choice(string.ascii_letters + string.digits) for n in xrange(32)])+'"'
NameError: global name 'random' is not defined

peterborkuti commented 6 years ago

Hi @bmix ,

Sorry I forget to commit imports. Now I fixed. Unfortunately, the previous version did not cause exception at me, probably because my cards dont contain ids, so be patient, please, I could not test it. I hope it will work.

Thank you Péter

peterborkuti / anki-addon-glossary

XHTML is not valid #4