Closed GoogleCodeExporter closed 8 years ago
I also noticed ganon has trouble handling GB2312 (Simplified Chinese). I ended
up having to use iconv to convert to GBK before parsing, which is pretty slow
for larger DOMs. Rules for charset conversions can be tricky.
Original comment by sjwood...@gmail.com
on 18 Oct 2012 at 1:52
I don't think it's a good idea to alter getPlainText, but maybe an extra method
called getPlainTextUTF8? Perhaps might be better to just use a local solution,
though.
Original comment by niels....@gmail.com
on 19 Oct 2012 at 4:34
The problem is... sometimes I dont know (or I dont want to know) in wich
charset is the input webpage... so any kind of autodetection would be great so
I can use my code always in the same charset
Original comment by Radika...@gmail.com
on 19 Oct 2012 at 4:39
Added a simple version of getPlainTextUTF8 in rev #76.
Original comment by niels....@gmail.com
on 20 Oct 2012 at 10:45
Original issue reported on code.google.com by
Radika...@gmail.com
on 6 Sep 2012 at 7:50