sylae / ligrev

XMPP MUC Utility bot
GNU General Public License v3.0
2 stars 1 forks source link

Things that become HTML entities ruin everything #57

Open cburschka opened 8 years ago

cburschka commented 8 years ago
(2016-05-11 13:13:39) Arancaytar: Everything is awful.
(2016-05-11 14:04:59) Gabriel: except spiders
(2016-05-11 14:05:10) Gabriel: spiders are great
(2016-05-11 14:05:23) Gabriel:  /╲/\╭( ͡° ͡° ͜ʖ ͡° ͡°)╮/\╱\
(2016-05-11 14:08:11) Gabriel: :tell Arancaytar /╲/\╭( ͡° ͡° ͜ʖ ͡° ͡°)╮/\╱\
(2016-05-11 14:08:12) Ligrev: Message for Arancaytar@calref.net processed.
(2016-05-11 15:12:13) Arancaytar: /kick Gab
(2016-05-11 15:12:14) Ligrev: Ligrev has left the room.
(2016-05-11 15:12:27) Arancaytar: Gab, not Ligrev
(2016-05-11 15:12:38) Ligrev: Ligrev has joined the room.
(2016-05-11 15:12:38) Ligrev: Ligrev version d6025eba747c8bd50a36076cdbcc84a5675886e4 now online.
sylae commented 8 years ago

Okay, so there seems to be a problem with Ligrev\command::_split(), the output for the above command is as follows:

array(7) {
  [0] =>
  string(5) ":tell"
  [1] =>
  string(10) "Arancaytar"
  [2] =>
  string(10) "/╲/\╭("
  [3] =>
  string(7) "͡°"
  [4] =>
  string(7) "͡°"
  [5] =>
  string(4) "͜ʖ"
  [6] =>
  string(7) "͡°"
}

An SQL dump of the tell table shows it's being stored properly (at least from what _split gives it, at least)...

INSERT INTO `tell` (`id`,`sender`,`recipient`,`sent`,`private`,`message`) VALUES (1,'sylae@calref.net','Arancaytar@calref.net',1462975037,0,'/╲/\\╭( ͡° ͡° ͜ʖ ͡°');

It appears to be an error with the regex in _splt, see the following regexr: http://regexr.com/3dd99

But none of this explains why Ligrev is shitting itself upon attempted delivery of the message...

sylae commented 8 years ago

It appears to fire off the following error (caught by JAXL, so who knows where it actually comes from...):

jaxl:782 - 2016-05-11 14:13:44 - event 'stanza_cb' catched in handle_other with stanza name error
sylae commented 8 years ago

Okay, so following the trail upwards, the error is coming from Ligrev\ligrevGlobals::sendMessage() ... well, not quite:

Here's the XML stanza we pass off to JAXL to send upstream to the server:

<message xmlns="jabber:client" type="groupchat" to="lounge@conference.calref.net" from="ligrev@calref.net/jaxl#aec5aec1ca93d342ef1235883f5296c8">
  <body>Message from sylae for username with spaces@calref.net at 3:26:58 PM GMT+1:
/╲/\╭( ͡&deg; ͡&deg; ͜ʖ ͡&deg;</body>
  <html xmlns="http://jabber.org/protocol/xhtml-im">
    <body xmlns="http://www.w3.org/1999/xhtml">
      <p>Message from <span class="user jid-node-sylae jid-domain-calref.net jid-resource-" data-jid="sylae@calref.net" data-nick="sylae">sylae</span> for <span class="user jid-node-username\9220with\9220spaces jid-domain-calref.net jid-resource-cadence/v1.10.0-calref-22-g269cb6f/1462975919478" data-jid="username\20with\20spaces@calref.net/cadence/v1.10.0-calref-22-g269cb6f/1462975919478">username with spaces@calref.net</span> at <span data-timestamp="2016-05-11T14:26:58+00:00">3:26:58 PM GMT+1</span>:
/╲/\╭( ͡&deg; ͡&deg; ͜ʖ ͡&deg;</p>
    </body>
  </html>
</message>

This should work just fine. But for some reason JAXL is shitting itself when we send this.

sylae commented 8 years ago

Okay, so it's...not a Unicode error at all? Removing all the shit and just sending a ° throws the same error.

<message xmlns="jabber:client" type="groupchat" to="lounge@conference.calref.net" from="ligrev@calref.net/jaxl#a3c5c7d244ab4108c02fd42fa045a4f5">
  <body>Message from sylae for username with spaces@calref.net at 3:43:00 PM GMT+1:
&deg;</body>
  <html xmlns="http://jabber.org/protocol/xhtml-im">
    <body xmlns="http://www.w3.org/1999/xhtml">
      <p>Message from <span class="user jid-node-sylae jid-domain-calref.net jid-resource-" data-jid="sylae@calref.net" data-nick="sylae">sylae</span> for <span class="user jid-node-username\9220with\9220spaces jid-domain-calref.net jid-resource-cadence/v1.10.0-calref-22-g269cb6f/1462977601220" data-jid="username\20with\20spaces@calref.net/cadence/v1.10.0-calref-22-g269cb6f/1462977601220">username with spaces@calref.net</span> at <span data-timestamp="2016-05-11T14:43:00+00:00">3:43:00 PM GMT+1</span>:
&deg;</p>
    </body>
  </html>
</message>

The question is when that turns into an HTML entity. Does cadence send it that way? Is JAXL processing it when it receives the stanza? By the time _split gets ahold of it, it's already been entitied...

sylae commented 8 years ago

jaxl_xml_stream.php:181 is what is causing things to become HTML entities, removing it makes Ligrev take ° correctly. The question is what are the consequences of removing this...

sylae commented 8 years ago

https://github.com/jaxl/JAXL/commit/3493186e1f7cca502a208a03cb0308eb75dec55c the commit that adds this line in question...I've removed it locally and can't seem to make it break (I've been throwing loose <,>,&, unclosed XML, etc at it but it's taking it fine). Is this just vestigial code or am I overlooking something?

sylae commented 8 years ago
[17:46:09] <Tyran> :Tell Sylae Files I saved to the CDN on my webspace don’t seem to load anymore. Could you please check that
[17:46:09] <Ligrev> Message for Sylae@calref.net processed.
[17:46:57] <Tyran> :tell Sylae Nevermind. They work with nethergate.net, just not calref.net
[17:46:57] <Ligrev> Message for Sylae@calref.net processed.

one of those :tell is causing ligrev to crash.

sylae commented 7 years ago

I'm going to make this a megathread to hold all the FUCKING html entity errors, because as we've discovered it's actually anything that becomes html-entitied (double quotes, degree sign, for some reason fancy quotes).