processone / ejabberd

Robust, Ubiquitous and Massively Scalable Messaging Platform (XMPP, MQTT, SIP Server)
https://www.process-one.net/ejabberd/
Other
6.12k stars 1.51k forks source link

mod_mam_sql stores invalid xml #2213

Open marcphilipp opened 6 years ago

marcphilipp commented 6 years ago

What version of ejabberd are you using?

17.09

What operating system (version) are you using?

CentOS

How did you install ejabberd (source, package, distribution)?

source

What did not work as expected? Are there error messages in the log? What was the unexpected behavior? What was the expected result?

When receiving messages with long bodies, mod_mam_sql inserts them into the database archive (MySQL in our case). Both the xml and the txt columns are cut off after 65535 characters. Thus, when loading them in a subsequent MAM query, the xml column contains invalid XML. The max_stanza_size option is set to 65535 on ejabberd_c2s but I'm not sure it's used for the websocket transport.

zinid commented 6 years ago

https://stackoverflow.com/questions/6766781/maximum-length-for-mysql-type-text

marcphilipp commented 6 years ago

I know that it's the maximum length for a TEXT column. That doesn't solve the problem that such messages are stored in a corrupted way in the database. ejabberd_http_ws does not check max_stanza_size. Thus, I have no way of preventing them from being stored in the first place. Once they are stored the MAM response is wrong. It only sends the messages that were not corrupted, but the count of the fin element in the IQ result includes the corrupted messages.

So, this might actually be two issues:

  1. mod_mam_sql should not store messages that will corrupt the archive table.
  2. ejabberd_http_ws needs a way to limit the maximum stanza size

I'd appreciate any pointer how to workaround or fix the issue.

zinid commented 6 years ago

Well, how do you suggest to resolve the first point? We have a ton of databases with their limitations, it's really a burden to check all these stupid restrictions. I think you just need to change the type of the field, or use "right" database: https://www.postgresql.org/docs/current/static/datatype-character.html which doesn't have these silly limitations. Regarding the second point, it's on my todo list to fix it. I will leave the issue opened until I fix it.

marcphilipp commented 6 years ago

how do you suggest to resolve the first point?

How about a config option for mod_mam? max_message_size or something along those lines?

I think you just need to change the type of the field, or use "right" database

I think allowing huge messages might cause a security vulnerability so I'd rather not go that route.

Regarding the second point, it's on my todo to fix it. I will leave the issue opened until I fix it.

Sounds good, thanks! 🙂

zinid commented 6 years ago

How about a config option for mod_mam? max_message_size or something along those lines?

How about asking MySQL developers to return an error on field overflow instead of truncating? Just like this is done in Postgres (for integer overflow).

marcphilipp commented 6 years ago

Fair enough. I'll leave you to it. 😉

dos1 commented 6 years ago

ejabberd_http_ws not checking max_stanza_size sounds like a worth reporting bug on its own.

Neustradamus commented 3 years ago

@prefiks: What do you think?