kerrickstaley / genanki

A Python 3 library for generating Anki decks
MIT License
1.91k stars 146 forks source link

SVG code fails regex "invalid HTML tags" #108

Closed szmejap closed 2 years ago

szmejap commented 2 years ago

Hello,

The invalid HTML tags check fails on the following note field contents:

<!-- AnimCJK 2016-2019 Copyright Francois Mizessyn - https://github.com/parsimonhi/animCJK Derived from: MakeMeAHanzi project - https://github.com/skishore/makemeahanzi Arphic PL KaitiM GB font Arphic PL UKai font You can redistribute and/or modify this file under the terms of the Arphic Public License as published by Arphic Technology Co., Ltd. You should have received a copy of this license along with this file. If not, see http://ftp.gnu.org/non-gnu/chinese-fonts-truetype/LICENSE. --> <svg id="z27700" class="acjk" version="1.1" viewBox="0 0 1024 1024" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"> <style> <![CDATA[ @keyframes zk { to { stroke-dashoffset:0; } } svg.acjk path[clip-path] { --t:0.8s; animation:zk var(--t) linear forwards var(--d); stroke-dasharray:3337; stroke-dashoffset:3339; stroke-width:128; stroke-linecap:round; fill:none; stroke:#000; } svg.acjk path[id] {fill:#ccc;} ]]> </style> <path id="z27700d1" d="M535 394Q538 201 560 138Q578 107 520 83Q486 64 465 70Q447 77 463 101Q485 129 486 164Q490 203 478 779Q477 803 463 812Q454 819 432 812Q407 806 382 801Q348 789 351 800Q352 807 373 822Q440 876 457 905Q476 941 493 942Q508 943 524 907Q543 859 541 783Q531 606 534 430C535 406 535 406 535 394Z"/> <path id="z27700d2" d="M154 399Q141 399 139 408Q138 415 153 423Q199 448 227 439Q333 411 343 411Q359 414 347 444Q296 574 249 638Q201 710 114 781Q99 794 110 797Q120 798 141 787Q217 747 281 676Q342 612 419 446Q429 422 441 411Q456 399 447 389Q437 376 399 363Q378 351 336 370Q270 391 154 399Z"/> <path id="z27700d3" d="M590 454Q630 424 766 316Q787 297 814 285Q838 273 825 253Q809 234 779 219Q752 204 738 208Q723 209 729 225Q735 261 659 347Q620 392 577 441C573 445 585 458 590 454Z"/> <path id="z27700d4" d="M577 441Q555 416 535 394C527 386 528 420 534 430Q756 739 817 740Q898 731 967 725Q995 722 996 715Q997 708 964 695Q810 647 753 605Q690 554 590 454C581 445 581 445 577 441Z"/> <defs> <clipPath id="z27700c1"><use xlink:href="#z27700d1"/></clipPath> <clipPath id="z27700c2"><use xlink:href="#z27700d2"/></clipPath> <clipPath id="z27700c3"><use xlink:href="#z27700d3"/></clipPath> <clipPath id="z27700c4"><use xlink:href="#z27700d4"/></clipPath> </defs> <path style="--d:1s;" pathLength="3333" clip-path="url(#z27700c1)" d="M461 81L524 137L509 800L477 890L356 802"/> <path style="--d:2s;" pathLength="3333" clip-path="url(#z27700c2)" d="M147 407L208 418L368 385L400 413L263 660L110 789"/> <path style="--d:3s;" pathLength="3333" clip-path="url(#z27700c3)" d="M736 214L772 271L586 448"/> <path style="--d:4s;" pathLength="3333" clip-path="url(#z27700c4)" d="M537 402L784 673L837 699L987 714"/> </svg>

I checked with Anki on Linux, and the field value is displayed correctly, if I paste it manually into a note.

The regular expression here: https://github.com/kerrickstaley/genanki/blob/5026448cb661570b2355afc5a45c1c9fcc9eea24/genanki/note.py#L51

Does not accept two tags: HTML comment, and CDATA. Comments can be easily fixed by including !-- as a valid opening, like so:

r'<(?!(/?[a-zA-Z0-9]+|!--)(?: .*|/?)>)(?:.|\n)*?>'

Adding CDATA opening explicitly also works:

r'<(?!(/?[a-zA-Z0-9]+|!--|!\[CDATA\[)(?: .*|/?)>)(?:.|\n)*?>'

regexr.com/6k6pf

I'm sorry for not doing a pull request now. I gotta run and wanted to describe this issue quickly, so that I don't forget about it.

Is there a set of tests, to make sure, that the suggested change to the regex doesn't break the functionality by accepting more, than Anki does? I have not checked, if comment closing tag is properly handled. The contents of CDATA are also not checked for correctness.

Cheers, Pawel

kerrickstaley commented 2 years ago

Should be fixed by https://github.com/kerrickstaley/genanki/commit/1b43c7517c53e231a3c56876c8d5fe7ea8bebdaf and https://github.com/kerrickstaley/genanki/commit/2f22b941831252de69e37ccd4002b636d22297c0 which will go out in the next release of genanki. Thanks for the great bug report!