wichert / lingua

Translation toolkit for Python
Other
46 stars 32 forks source link

Issue with expat breaking lines not only on newline #18

Closed graffic closed 10 years ago

graffic commented 11 years ago

Hello,

I'm trying to extract some strings from expressions in a chameleon template. Having found that some strings were missing, I debugged a bit the xml parser.

I've found that some times CharacterDataHandler (in parsers/xml.py) is called with half lines, therefore the regular expression ignores the line.

I've also found that setting parser.buffer_text to True helps by sending all the content to the CharacterDataHandler but I feel that if the buffer is full, the problem can happen again. Python documentation for that buffer doesn't say anything about it being expanded when needed.

I can provide the faulty file on request. I cannot make it public.

the solution we have found is to wrap every expression with a fake tag:

<tal:s>${expression containing _()}</tal:s>

graffic commented 11 years ago

Let me add the output from the debugging session. Adding a print when CharacterDataHandler is called, this is the result in the problematic area:

CharacterDataHandler called with:           ${form.boot_radio('why', 'too_expensive', request.translate(_("I can't afford it/too expensive")))}
CharacterDataHandler called with: 

CharacterDataHandler called with:           ${form.boot
CharacterDataHandler called with: _radio('why', 'not_much_use', request.translate(_("I'm not using the service as much as I expected")))}
wichert commented 10 years ago

The current version of lingua uses Chameleon's HTML parser, which handles this correctly.