Closed angryfishcake closed 9 years ago
This works if you specify the input/output encoding (e.g. tidy -utf8
) and provide correctly encoded input. If you're having problems, you should post a sample file and the exact command-line options used.
''' hide-comments: true tidy-mark: false indent: true indent-spaces: 4 new-blocklevel-tags: articleheaderfootersectionnav new-inline-tags: videoaudiocanvasrubyrtrp doctype: <!DOCTYPE HTML> sort-attributes: alpha vertical-space: false output-xhtml: true wrap: 0 wrap-attributes: false break-before-br: false numeric-entities: yes '''
those are my settings. the files encoding type is set to utf8 without bom. default input/ output encoding set to utf8 am i missing something?
Tidy normally uses UTF-8 as the default encoding but you could try @acdha's suggestion above or adding char-encoding: utf8
to your config file. If that doesn't work, it'd be easier to figure out the problem if you told us what platform you're using and posted a small sample of input and the output you get for it, maybe as a gist.
hmm didnt seem to make a difference. its just a normal .html file. im using notepad plusplus with the plugin tidy2 which is using tidyhtml5.
I have come across this issue as well, but only on files which are big-endian UTF-8 without a BOM.
This is occurring on a Windows install of notepad++.
@jonapgar UTF-8 does not have big or little endian modes and the use of a BOM is not recommended with UTF-8. If you have text which is UTF-8 without a BOM and using either -utf8
or char-encoding: utf8
it works as expected – perhaps the problem is that notepad++, which appears to be the common factor, is either not setting the encoding or is injecting an unnecessary BOM?
I will close this due to age. I don't see evidence of an issue, but please feel free to open this again, @angryfishcake, if the problem persists.
tidy does not handle html symbols like: £ or
results in xA3 and xA0...
tried changing the input/out/character encoding or enabling ascii chars.