mwilliamson / mammoth.js

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
5.02k stars 547 forks source link

Hidden elements should be ignored when converting word documents #230

Open cnspray opened 4 years ago

cnspray commented 4 years ago

word : image

clipboardData:item.type === "text/html" image

regular expression:

replace(/<([a-z0-9]*)[^>]*\s*display:none[\s\S]*?><\/\1>/gi,'')
cnspray commented 4 years ago

should filter the contained <w:vanish /> elements?

    function readXmlElement(element) {
        if (element.type === "element") {
            try {
                if (element.name=="w:r" && element.first("w:rPr").first("w:vanish")){
                var message = warning("An hidden  element was ignored: " + element.name);
                return emptyResultWithMessages([message]);}
            }
            catch(err){
                //continue; 
            }
...