wichert / lingua

Translation toolkit for Python
Other
46 stars 32 forks source link

lingua skips translation strings if they contain an ampersand #8

Closed AnneGilles closed 12 years ago

AnneGilles commented 12 years ago

I just noticed that lingua's xml extractor will skip translation strings if they contain a '&' (ampersand).

lingua will also skip all following translation strings in the same file.

wichert commented 12 years ago

I suspect that it trips over an invalid XML entity. lingua has tests that check if handling of entities work:

    def test_translate_HtmlEntity(self):
        snippet = """\
                <html xmlns:i18n="http://xml.zope.org/namespaces/i18n"
                      i18n:domain="lingua">
                  <button i18n:translate="">lock &amp; load&nbsp;</button>
                </html>
                """
        self.assertEqual(self.extract(snippet),
                [(3, None, u"lock &amp; load&nbsp;", [])])
AnneGilles commented 12 years ago

Well, my & was not part of an html entity, but just in some text, e.g. "copy & paste", Standard & Poor's", etc.

def test_translate_NonHtmlEntity(self):
    snippet = """\                                                                                                                                      
            <html xmlns:i18n="http://xml.zope.org/namespaces/i18n"                                                                                      
                  i18n:domain="lingua">                                                                                                                 
              <p i18n:translate="">Standard & Poor's</p>                                                                                                
            </html>                                                                                                                                     
            """
    self.assertEqual(self.extract(snippet),
            [(3, None, u"Standard & Poor's", [])])
wichert commented 12 years ago

Your example is not valid XML, which is why lingua aborts at that point: the XML parser errors out at that point. Keep in mind that lingua does not support HTML-notation, only XML notation.