hannesm / jackline

minimalistic secure XMPP client in OCaml
BSD 2-Clause "Simplified" License
250 stars 20 forks source link

Only strip annoying tags from Pidgin etc #147

Open cfcs opened 7 years ago

cfcs commented 7 years ago

Right now all tags are stripped from messages: https://github.com/hannesm/jackline/blob/master/cli/cli_commands.ml#L295

This means that pasted XML stanzas and other useful things will get stripped. To get around that special case we could check if the data starts with ?, and that would be nice, but perhaps we should also try to compile a better list of all the crappy tags that Pidgin and Gajim etc insert and remove those specifically rather than all html tags.

It would also be nice with a [HTML tags removed from message] label to indicate to the user that something was removed.

cfcs commented 7 years ago

BTW, it looks like tags are not stripped from group messages? https://github.com/hannesm/jackline/blob/master/cli/cli_commands.ml#L319

hannesm commented 7 years ago

hmm, isn't it the case that XMPP specifies the body to be PCDATA? and thus, there should no xml entities in there (such as <, >, &)? if you want to transmit them over the wire, you need to escape them (&lt; etc.) -- which should be done in group messages and unencrypted direct messages.

the only case where we manually need to strip the tags and unescape are otr-encrypted messages (the raw xml stream should be handled by the xmpp library!).

also, if you have a proper client and copy/paste XML stanzas, there should be an escaping step done by the client, and jackline will unescape (and not strip) upon receiving them... it might be that some clients violate escaping, and will then be stripped... is <font> the only offending thing inserted by audium/pidgin/libpurple? anyone has code of other clients at hand (if they have a whitelist of what to strip)!?

cfcs commented 7 years ago

Checking the other clients is a good idea. I noticed it when talking to a Pidgin user.