Closed Waywardnl closed 8 years ago
Added the snipped as text
I cannot add a text file
It looks like you're using Windows. The default coding of the text is utf-8, which is not default under Windows, but is set in the header of the xml file. I have been testing a limited things under Windows. I have set all direct screen output under Windows to the ancient native Windows codeset. These are tricky things. I can not really test stuff under your software, so it is very limited what we can do. Do however check the header of you xmltv file!
I've been thinking some further. The output you show shows é correctly, so I very much doubt that's the problem. It seems to understand utf-8, I however can think about creating an option to use the ancient Windows codeset, but I doubt it will help. Your output gives the position of the 'illegal' character. I suspect it is the one shown as a '?'. Please verify on the basis of given line and position number.
This one comes very probably from one of the source websites. Especially tvgids.tv is very sloppy and gives regularly errors due to invalid coding. You can try to disable the tvgids.tv source. e.g.
Add to the [Configuration] section of your config file:
disable_source = 1
Also check if your program does not have an option to ignore such invalid characters.
Hika
Hello,
I have been thinking also, and i converted all the strange characters with php becease i just want to get them all!
Here is the code: <?php
$bron="/root/.xmltv/tv.xml"; $doel="/root/.xmltv/tv_checked.xml";
# $handle = fopen($bron, "r") or die("Couldn't get handle");
#
# $fp = @fopen($doel, "w");
fclose($fp);
$fp = @fopen($doel, "a");
if ($handle) { while (!feof($handle)) { $buffer = fgets($handle, 4096); // Process buffer here..
## Change the characters
#
$buffer=str_replace("ü", "u", $buffer);
$buffer = str_replace("…", ".", $buffer);
$buffer=str_replace("ú", "u", $buffer);
$buffer=str_replace("ù", "u", $buffer);
$buffer=str_replace("û", "u", $buffer);
$buffer=str_replace("ï", "i", $buffer);
$buffer=str_replace("í", "i", $buffer);
$buffer=str_replace("ì", "i", $buffer);
$buffer=str_replace("î", "i", $buffer);
$buffer=str_replace("ë", "e", $buffer);
$buffer=str_replace("é", "e", $buffer);
$buffer=str_replace("è", "e", $buffer);
$buffer=str_replace("ê", "e", $buffer);
$buffer=str_replace("ö", "o", $buffer);
$buffer=str_replace("ó", "o", $buffer);
$buffer=str_replace("ò", "o", $buffer);
$buffer=str_replace("ô", "o", $buffer);
$buffer=str_replace("ø", "o", $buffer);
$buffer=str_replace("ÿ", "y", $buffer);
$buffer=str_replace("ý", "y", $buffer);
$buffer=str_replace("Ü", "U", $buffer);
$buffer=str_replace("Ú", "U", $buffer);
$buffer=str_replace("Ù", "U", $buffer);
$buffer=str_replace("Û", "U", $buffer);
$buffer=str_replace("Ï", "I", $buffer);
$buffer=str_replace("Í", "I", $buffer);
$buffer=str_replace("Ì", "I", $buffer);
$buffer=str_replace("Î", "I", $buffer);
$buffer=str_replace("Ë", "E", $buffer);
$buffer=str_replace("É", "E", $buffer);
$buffer=str_replace("È", "E", $buffer);
$buffer=str_replace("Ê", "E", $buffer);
$buffer=str_replace("Ý", "Y", $buffer);
$buffer=str_replace("Ö", "O", $buffer);
$buffer=str_replace("Ó", "O", $buffer);
$buffer=str_replace("Ò", "O", $buffer);
$buffer=str_replace("Ô", "O", $buffer);
$buffer=str_replace("Ø", "O", $buffer);
$buffer=str_replace("é", "e", $buffer);
$buffer=str_replace("â€", "`", $buffer);
$buffer=str_replace("ï", "i", $buffer);
$buffer=str_replace("Â ", " ", $buffer);
$buffer=str_replace("ë", "ee", $buffer);
$buffer=str_replace("’", "_", $buffer);
$buffer=str_replace("´", "_", $buffer);
$buffer=str_replace("ú", "u", $buffer);
$buffer=str_replace("è", "e", $buffer);
$buffer=str_replace("<", "", $buffer);
$buffer=str_replace("`¦", "", $buffer);
$buffer=str_replace("ç", "c", $buffer);
$buffer=str_replace("£", "", $buffer);
$buffer=str_replace("`œ", "`", $buffer);
$buffer=str_replace("€", "EURO", $buffer);
$buffer=str_replace("`™", "", $buffer);
$buffer=str_replace("â‚", "euro", $buffer);
$buffer=str_replace("û", "u", $buffer);
$buffer=str_replace("Ã ", "a ", $buffer);
$buffer=str_replace("Ã", "i", $buffer);
$buffer=str_replace("í‰é", "1", $buffer);
$buffer=str_replace("%", "procent", $buffer);
$buffer=str_replace("i¢", "a", $buffer);
$buffer=str_replace("&_", "en", $buffer);
$buffer=str_replace("i³", "o", $buffer);
$buffer=str_replace("¤", "", $buffer);
$buffer=str_replace("i¤", "a", $buffer);
$buffer=str_replace("i¼", "u", $buffer);
$buffer=str_replace("i¶", "o", $buffer);
$buffer=str_replace("iŸ", "s", $buffer);
$buffer=str_replace("iª", "e", $buffer);
$schrijf = @fwrite ($fp, $buffer);
echo ".";
}
fclose($handle);
}
# fclose($fp);
?>
Now the file works
As said I don't think the accents are the problem and I cannot remove them because others want them. You ask to go to just ASCI. You do kill more then needed, but... I notice some & characters and other weird ones in your list, these might be cause and come as I said from bad coding on the source. If you have the capacity try recoding from utf-8 to iso-8859-15 (also called latin9) or windows-1252 (also called cp1252 or Western Europe) and report on it. The first is the general predecessor and the second the ancient Windows default and still used.
I still say look for a setting to ignore unrecognized characters. If the program is anywhere decent created it will have one. But then that might be asked to much from Microsoft!
Well the EPG xml parser to microsoft mxf does have a setting utf8, but it does nothing. Yes it is gunshot troubleshooting, but we have to begin somewhere.
When i find the time and the drive to translate it i will surely post it here, becease i think your scraper is awesome!
Thanks! In the mean time I will look into an option to set a different coding for the output. But know that utf-8 (and unicode) are the future. All other codings will slowly fade out in the coming years. Only Microsoft is always slow to accept something they don't own and control! ;-(
;-) Hear Hear, but i like the wmc backend of windows media centre ;-) And it works well with kodi. I was very amused to read that Micorosoft has to be compatible with odf.. But that... my children is a whole other story ;-)
Did the new option --output-windows-codeset
, using cp1252 or windows-1252 for the outputfile help?
Good morning,
Back from vacation.
I wanted to try the new option with version 2.2.2,but it seems to hang at: NPO
Now fetching details for 68 programs on ITV 1(xmltvid=itv-1) (channel 15 of 92) for 14 days.
Now fetching 17 channels from npo.nl (day 4 of 14).
Bloated too soon, now it goes through
Hello,
I use bigscreen EPG to parse the XML to windows media center, i get an error on the xml file. I have added a snippet:
[code]
[/code]
I Suspect the letter é is the problem, could this be translated?
Thankyou, Roland de Leeuw