progval / Supybot-plugins

Collection of plugins for Supybot/Limnoria I wrote or forked.
https://github.com/ProgVal/Limnoria/
108 stars 63 forks source link

NoLatin1: Please support also non-French channels. #150

Open Mikaela opened 10 years ago

Mikaela commented 10 years ago

Latin1 is wide-spread problem of at least copy-paste from Wikipedia:

# Languages with complete coverage[edit]
Afrikaans
Albanian
Basque
Breton
Catalan
Corsican
Danish
English (UK and US)
Faroese
Galician
German
Icelandic
Indonesian
Irish (new orthography)
Italian
Latin (basic classical orthography)
Leonese
Luxembourgish (basic classical orthography)
Malay
Manx
Norwegian (Bokmål and Nynorsk)
Occitan
Portuguese
Rhaeto-Romanic
Scottish Gaelic
Spanish
Swahili
Swedish
Walloon

#Languages commonly supported but with incomplete coverage[edit]
Language    Missing characters  Typical workaround  Supported by
Catalan Ŀ, ŀ (deprecated) L·, l·    
Czech   Č, č, Ř, ř, Š, š, Ž, ž, ch  digraph ch  ISO-8859-2, Windows-1250
Dutch   IJ, ij  digraphs IJ, ij 
Estonian    Š, š, Ž, ž (only present in loanwords)  Sh, sh, Zh, zh  ISO-8859-15, Windows-1252
Finnish Š, š, Ž, ž (only present in loanwords)  Sh, sh, Zh, zh  ISO-8859-15, Windows-1252
French  Œ, œ, and the very rare Ÿ    digraphs OE, oe, and Y without the diaeresis    ISO-8859-15, Windows-1252
Hungarian   Ő, ő, Ű, ű  Õ, õ (or Ô, ô; sometimes Ö, ö), Û, û (sometimes Ü, ü) ISO-8859-2, Windows-1250
Irish (traditional orthography) Ḃ, ḃ, Ċ, ċ, Ḋ, ḋ, Ḟ, ḟ, Ġ, ġ, Ṁ, ṁ, Ṡ, ṡ, Ṫ, ṫ  Bh, bh, Ch, ch, Dh, dh, Fh, fh, Gh, gh, Mh, mh, Sh, sh, Th, th  ISO-8859-14
Latin with macrons  Ā, ā, Ē, ē, Ī, ī, Ō, ō, Ū, ū      ISO-8859-13, Windows-1257
Māori  Ā, ā, Ē, ē, Ī, ī, Ō, ō, Ū, ū  Ä, ä, Ë, ë, Ï, ï, Ö, ö, Ü, ü  ISO-8859-13, Windows-1257
Turkish İ, ı, Ğ, ğ, Ş, ş  I, i, G, g, S, s    ISO-8859-3, ISO-8859-9, Windows-1254
Welsh   Ẁ, ẁ, Ẃ, ẃ, Ŵ, ŵ, Ŷ, ŷ      ISO-8859-14

With Finnish älphäbet our biggest issue is Ä and Ö and Å (ä and ö and å) which we share with Swedish.

Alternatively you could tell how to add support for languages as this plugin doesn't sound so complex.

Mikaela commented 10 years ago

Oh, and I reported this, because I saw push notification about highlight where I was told that this supports only French channels.

I am also surprised that Finnish and French aren't even fully supported by Latin1.

Mikaela commented 10 years ago

Reading the code, I don't seem to understand it at all. It looks like it only complains if detected encoding isn't ASCII or UTF-8.

        if encoding not in ('utf-8', 'ascii'):

I thought that there are specified by hand French älphäbet chars.

I don't think I understand why does this only work with French channels?

Mikaela commented 10 years ago

It seems that after that push notification there was continuing talk and unsurity does it work even with French. I will keep this issue open, until @ProgVal confirms that it works.

Mikaela commented 10 years ago

It doesn't work.

This (HexChat spamming using latin1 shown from UTF-8 only WeeChat)

15:19:51 <@Ciblia> I �m n�w l�tin sp�mmer. (Temp�r�rily t� test N�Latin1)
15:19:55 <@Ciblia> I �m n�w l�tin sp�mmer. (Temp�r�rily t� test N�Latin1)
15:19:57 <@Ciblia> I �m n�w l�tin sp�mmer. (Temp�r�rily t� test N�Latin1)
15:19:59 <@Ciblia> I �m n�w l�tin sp�mmer. (Temp�r�rily t� test N�Latin1)
15:20:01 <@Ciblia> I �m n�w l�tin sp�mmer. (Temp�r�rily t� test N�Latin1)
15:20:07 <@Ciblia> M�ybe being identified �ffects it.
15:20:13 <@Ciblia> Checking l�gs...

Produces this.

    method(irc, msg)
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/supybot/plugins/NoLatin1/plugin.py", lin                                                                                                                                                             
e 63, in doPrivmsg
    encoding = chardet.detect(content)['encoding']
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/chardet-2.2.1-py3.2.egg/chardet/__init__                                                                                                                                                             
.py", line 25, in detect
    raise ValueError('Expected a bytes object, not a unicode object')
ValueError: Expected a bytes object, not a unicode object
ERROR 2014-04-14T15:19:57 supybot Exception id: 0x81b2e
ERROR 2014-04-14T15:19:59 supybot Uncaught exception in NoLatin1.__call__:
Traceback (most recent call last):
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/supybot/log.py", line 355, in m
    return f(self, *args, **kwargs)
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/supybot/irclib.py", line 125, in __call_                                                                                                                                                             
_
    method(irc, msg)
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/supybot/plugins/NoLatin1/plugin.py", lin                                                                                                                                                             
e 63, in doPrivmsg
    encoding = chardet.detect(content)['encoding']
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/chardet-2.2.1-py3.2.egg/chardet/__init__                                                                                                                                                             
.py", line 25, in detect
    raise ValueError('Expected a bytes object, not a unicode object')                                                                                                                                                                                              
ValueError: Expected a bytes object, not a unicode object
ERROR 2014-04-14T15:19:59 supybot Exception id: 0x81b2e
ERROR 2014-04-14T15:20:01 supybot Uncaught exception in NoLatin1.__call__:
Traceback (most recent call last):
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/supybot/log.py", line 355, in m
    return f(self, *args, **kwargs)
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/supybot/irclib.py", line 125, in __call_
_
    method(irc, msg)
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/supybot/plugins/NoLatin1/plugin.py", lin                                                                                                                                                             
e 63, in doPrivmsg
    encoding = chardet.detect(content)['encoding']
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/chardet-2.2.1-py3.2.egg/chardet/__init__                                                                                                                                                             
.py", line 25, in detect
    raise ValueError('Expected a bytes object, not a unicode object')
ValueError: Expected a bytes object, not a unicode object
ERROR 2014-04-14T15:20:01 supybot Exception id: 0x81b2e
ERROR 2014-04-14T15:20:07 supybot Uncaught exception in NoLatin1.__call__:
Traceback (most recent call last):
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/supybot/log.py", line 355, in m
    return f(self, *args, **kwargs)
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/supybot/irclib.py", line 125, in __call_
_
    method(irc, msg)
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/supybot/plugins/NoLatin1/plugin.py", lin                                                                                                                                                             
e 63, in doPrivmsg
    encoding = chardet.detect(content)['encoding']
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/chardet-2.2.1-py3.2.egg/chardet/__init__
.py", line 25, in detect
    raise ValueError('Expected a bytes object, not a unicode object')
ValueError: Expected a bytes object, not a unicode object
ERROR 2014-04-14T15:20:07 supybot Exception id: 0x81b2e
ERROR 2014-04-14T15:20:13 supybot Uncaught exception in NoLatin1.__call__:
Traceback (most recent call last):
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/supybot/log.py", line 355, in m
    return f(self, *args, **kwargs)
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/supybot/irclib.py", line 125, in __call_
_
    method(irc, msg)        
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/supybot/plugins/NoLatin1/plugin.py", lin
e 63, in doPrivmsg          
    encoding = chardet.detect(content)['encoding']
  File "/home/users/mkaysi/.local/lib/python3.2/site-packages/chardet-2.2.1-py3.2.egg/chardet/__init__
.py", line 25, in detect    
    raise ValueError('Expected a bytes object, not a unicode object')
ValueError: Expected a bytes object, not a unicode object
ERROR 2014-04-14T15:20:13 supybot Exception id: 0x81b2e
INFO 2014-04-14T15:20:32 supybot Flushers flushed and garbage collected.
zekirdek commented 10 years ago

my same problem

http://lakka.kapsi.fi:62291/weblogs/html/%23limnoria-bots/ http://i.hizliresim.com/p42dWN.png (here latin (turkish) charakter normal good)

http://i.hizliresim.com/GnGdy7.png ( here dont show turkish charakter) / this my bot

Mikaela commented 10 years ago

I believe you have a different issue.