Open ksylva opened 3 years ago
We have not the problem of caracters under Linux. we noticed the same problem with several windows computers. under windows, so I try to solve the problem in three different ways: first, after extraction of the data i change the question marks by the correct caracter manually. The tests are ok but it is a bad solution. second i change the content of the witness file by the content of extract file with the question marks. the tests are ok but it's also a very very bad solution. And third, is to set the file appropreate encoding. i have try many encoding (utf-8, utf-16, windows 1252, us-ascii, iso-8859-1...) but the problem of caracters was not resolve.
characters such as ĉ, ĝ, ĥ, ĵ, ŝ, ŭ,ア... are replaced by ?, ?, ?, ?, ?, ?, ?.... Also some expressions have been modified on the site. for example the expression « International Phonetic Alphabet » was remplaced by « IPA » .
the chosen solution is to update the witness file manually by looking at the page in question. Because the pages have been modified.
Some characters with accent is not extract correctly. By example ĉ, û,... is extract like as ?..