processone / ejabberd

Robust, Ubiquitous and Massively Scalable Messaging Platform (XMPP, MQTT, SIP Server)
https://www.process-one.net/en/ejabberd/
Other
6.06k stars 1.51k forks source link

mod_mam with external authentication, unable to retrieve all chat messages #3999

Closed dkliss closed 1 year ago

dkliss commented 1 year ago

Hi,

I have external authentication & have following configuration in ejabberd.yml. Default db is SQL.

Ejabberd Version: 23.01

  mod_mam:
    ## Mnesia is limited to 2GB, better to use an SQL backend
    ## For small servers SQLite is a good fit and is very easy
    ## to configure. Uncomment this when you have SQL configured:
    ## db_type: sql
    assume_mam_usage: true
    default: always
  mod_offline:
    access_max_user_messages: max_user_offline_messages

Using above config, when I send a message, received message stanza show "archived" flag. But when I query the server archive, I do not see any historical or recent chat messages at all.

I came across similar issue in here https://github.com/processone/ejabberd/issues/3377 , which was raised 3 years ago. It talks about JWT auth being a reason for a client not to be able to retrieve server messages as user tables are missing from SQL due to external auth. My offline message delivery however works with no issues. Based on that link does it mean, if we use ext auth, we cannot use mom_mam to store chat history? Or, if I need to configure something else to set this up.

Thanks for your help!

Before creating a ticket, please consider if this should fit the discussion forum better.

Environment

Configuration (only if needed): grep -Ev '^$|^\s*#' ejabberd.yml

loglevel: 4
...

Errors from error.log/crash.log

No errors

Bug description

Please, give us a precise description (what does not work, what is expected, etc.)

prefiks commented 1 year ago

If your external script properly implement isuser query, then both mam and offline should be working for a user.

Problems is only with methods that can't tell is given user exists or not (like various cert authentication or jwt, where tokens are created on the fly) - but external auth is not one of that. But your script must be able to correctly responds to isuser requests.

dkliss commented 1 year ago

If your external script properly implement isuser query, then both mam and offline should be working for a user.

Problems is only with methods that can't tell is given user exists or not (like various cert authentication or jwt, where tokens are created on the fly) - but external auth is not one of that. But your script must be able to correctly responds to isuser requests.

The issue I have is bit confusing. I did not receive archived messages which I sent recently when i sent request to mam. But upon that query, i did received 50 or so messages, few weeks old. My guess is, those messages were because of server handling "unacknowledged messages when the connection is lost". Not sure, if being able to receive those message confirms if my isUser setup is correct.

assume_mam_usage: true | false This option determines how ejabberd’s stream management code (see mod_stream_mgmt) handles unacknowledged messages when the connection is lost. Usually, such messages are either bounced or resent. However, neither is done for messages that were stored in the user’s MAM archive if this option is set to true. In this case, ejabberd assumes those messages will be retrieved from the archive. The default value is false.

dkliss commented 1 year ago

If your external script properly implement isuser query, then both mam and offline should be working for a user.

Problems is only with methods that can't tell is given user exists or not (like various cert authentication or jwt, where tokens are created on the fly) - but external auth is not one of that. But your script must be able to correctly responds to isuser requests.

I did some more tests. The script I used for auth is from below link.

https://www.npmjs.com/package/ejabberd-auth

And it does respond to isUser requests but it will only confirm if user exists but will not return anything (i.e. username).

The issue I have is still same. Every time I call query MAM, i keep on receiving same set of 50 or so old messages repeatedly and none of my latest sent, received messages are being retrieved even though I can see in stanza of my received messaged tagged as "archived". Offline messages does work as expected. Only chat history retrieval does not work on request.

Test 1: Same Result.

  mod_mam:
    ## Mnesia is limited to 2GB, better to use an SQL backend
    ## For small servers SQLite is a good fit and is very easy
    ## to configure. Uncomment this when you have SQL configured:
    ## db_type: sql
    access_preferences: all
    assume_mam_usage: true
    compress_xml: true
    use_cache: true
    cache_size: 20000
    default: always

Test 2: Same Result.

  mod_mam:
    ## Mnesia is limited to 2GB, better to use an SQL backend
    ## For small servers SQLite is a good fit and is very easy
    ## to configure. Uncomment this when you have SQL configured:
    ## db_type: sql
    assume_mam_usage: true
    compress_xml: true #Only for MySQL
    default: always 
dkliss commented 1 year ago

If your external script properly implement isuser query, then both mam and offline should be working for a user.

Problems is only with methods that can't tell is given user exists or not (like various cert authentication or jwt, where tokens are created on the fly) - but external auth is not one of that. But your script must be able to correctly responds to isuser requests.

After some more tests, interestingly, the mom_mam works if i change default_db to mnesia. It however does not work, if the database is My SQL. I also tried to revert back from 23.01 to 22.10 and still the MySQL archive does not work.

I can also confirm that the archive tables are moved to mySQL as I I no longer see below lines in installation logs created after changing default db to MySQL.

Not sure if the issue is in MySQL\TLS\Tables or any other place. My MySQL config is also copied below.

TEST SETUP: TEST PASSED

  1. Keep default db as MySQL.
  2. Change default db for mod_mam module to mnesia.
  3. Rest is all same.

RESULTS: mod_mam works as expected if default db is mnesia i.e. I can see recent chat history when retrieved. TEST PASS.


2022-10-29 13:49:01.760767+00:00 [warning] Mnesia backend for mod_mam is not recommended: it's limited to 2GB and often gets corrupted when reaching this limit. SQL backend is recommended. Namely, for small servers SQLite is a preferred choice because it's very easy to configure.
2022-10-29 13:49:01.761057+00:00 [info] Creating Mnesia disc_only table 'archive_msg'
2022-10-29 13:49:01.769442+00:00 [info] Creating Mnesia disc_only table 'archive_prefs'

mod_mam configuration.

  mod_mam:
    ## Mnesia is limited to 2GB, better to use an SQL backend
    ## For small servers SQLite is a good fit and is very easy
    ## to configure. Uncomment this when you have SQL configured:
    #db_type: sql
    db_type: mnesia
    request_activates_archiving: false
    assume_mam_usage: true
    compress_xml: true #Only for MySQL
    default: always

My SQL Setup

  default_db: sql
  sql_type: mysql
  sql_server: "10.250.191.81"
  sql_database: "ejabberd"
  sql_username: "ejabberd"
  sql_password: "Password123"
  sql_port: 3306
  sql_ssl: true. 
  sql_ssl_verify: false
  sql_ssl_cafile: "/opt/ejabberd/conf/mysql-ca.pem"
  sql_ssl_certfile: "/opt/ejabberd/conf/mysql-client-cert.pem". 
  sql_connect_timeout: 10
  sql_keepalive_interval: 1

Current SQL Tables

mysql> SHOW TABLES;
+-------------------------+
| Tables_in_ejabberd      |
+-------------------------+
| archive                 |
| archive_prefs           |
| bosh                    |
| caps_features           |
| last                    |
| mix_channel             |
| mix_pam                 |
| mix_participant         |
| mix_subscription        |
| motd                    |
| mqtt_pub                |
| muc_online_room         |
| muc_online_users        |
| muc_registered          |
| muc_room                |
| muc_room_subscribers    |
| oauth_client            |
| oauth_token             |
| privacy_default_list    |
| privacy_list            |
| privacy_list_data       |
| private_storage         |
| proxy65                 |
| pubsub_item             |
| pubsub_node             |
| pubsub_node_option      |
| pubsub_node_owner       |
| pubsub_state            |
| pubsub_subscription_opt |
| push_session            |
| roster_version          |
| rostergroups            |
| rosterusers             |
| route                   |
| sm                      |
| spool                   |
| sr_group                |
| sr_user                 |
| users                   |
| vcard                   |
| vcard_search            |
+-------------------------+
41 rows in set (0.00 sec)
dkliss commented 1 year ago

If your external script properly implement isuser query, then both mam and offline should be working for a user. Problems is only with methods that can't tell is given user exists or not (like various cert authentication or jwt, where tokens are created on the fly) - but external auth is not one of that. But your script must be able to correctly responds to isuser requests.

After some more tests, interestingly, the mom_mam works if i change default_db to mnesia. It however does not work, if the database is My SQL. I also tried to revert back from 23.01 to 22.10 and still the MySQL archive does not work.

I can also confirm that the archive tables are moved to mySQL as I I no longer see below lines in installation logs created after changing default db to MySQL.

Not sure if the issue is in MySQL\TLS\Tables or any other place. My MySQL config is also copied below.

TEST SETUP: TEST PASSED

  1. Keep default db as MySQL.
  2. Change default db for mod_mam module to mnesia.
  3. Rest is all same.

RESULTS: mod_mam works as expected if default db is mnesia i.e. I can see recent chat history when retrieved. TEST PASS.


2022-10-29 13:49:01.760767+00:00 [warning] Mnesia backend for mod_mam is not recommended: it's limited to 2GB and often gets corrupted when reaching this limit. SQL backend is recommended. Namely, for small servers SQLite is a preferred choice because it's very easy to configure.
2022-10-29 13:49:01.761057+00:00 [info] Creating Mnesia disc_only table 'archive_msg'
2022-10-29 13:49:01.769442+00:00 [info] Creating Mnesia disc_only table 'archive_prefs'

mod_mam configuration.

  mod_mam:
    ## Mnesia is limited to 2GB, better to use an SQL backend
    ## For small servers SQLite is a good fit and is very easy
    ## to configure. Uncomment this when you have SQL configured:
    #db_type: sql
    db_type: mnesia
    request_activates_archiving: false
    assume_mam_usage: true
    compress_xml: true #Only for MySQL
    default: always

My SQL Setup

  default_db: sql
  sql_type: mysql
  sql_server: "10.250.191.81"
  sql_database: "ejabberd"
  sql_username: "ejabberd"
  sql_password: "Password123"
  sql_port: 3306
  sql_ssl: true. 
  sql_ssl_verify: false
  sql_ssl_cafile: "/opt/ejabberd/conf/mysql-ca.pem"
  sql_ssl_certfile: "/opt/ejabberd/conf/mysql-client-cert.pem". 
  sql_connect_timeout: 10
  sql_keepalive_interval: 1

Current SQL Tables

mysql> SHOW TABLES;
+-------------------------+
| Tables_in_ejabberd      |
+-------------------------+
| archive                 |
| archive_prefs           |
| bosh                    |
| caps_features           |
| last                    |
| mix_channel             |
| mix_pam                 |
| mix_participant         |
| mix_subscription        |
| motd                    |
| mqtt_pub                |
| muc_online_room         |
| muc_online_users        |
| muc_registered          |
| muc_room                |
| muc_room_subscribers    |
| oauth_client            |
| oauth_token             |
| privacy_default_list    |
| privacy_list            |
| privacy_list_data       |
| private_storage         |
| proxy65                 |
| pubsub_item             |
| pubsub_node             |
| pubsub_node_option      |
| pubsub_node_owner       |
| pubsub_state            |
| pubsub_subscription_opt |
| push_session            |
| roster_version          |
| rostergroups            |
| rosterusers             |
| route                   |
| sm                      |
| spool                   |
| sr_group                |
| sr_user                 |
| users                   |
| vcard                   |
| vcard_search            |
+-------------------------+
41 rows in set (0.00 sec)

ADDITIONAL INFORMATION ON MY SQL: When i setup db as MySQL for mod_mam. I logged into my MySQL database. Then I sent a new message and using command below, I can see that message is displayed for the two users (sender and receiver ) in MySQL archive table. So, My guess is I can write to the database but reading back seems to be an issue.

SELECT * FROM archive;

However, I do not see a transaction against Archive engine even though data is written in MySQL.

 Engine             | Support | Comment                                                        | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
| ARCHIVE            | YES     | Archive storage engine                                         | NO           | NO   | NO      

 InnoDB             | DEFAULT | Supports transactions, row-level locking, and foreign keys     | YES          | YES  | YES   
prefiks commented 1 year ago

Maybe you have entry for your user in archive_prefs table that prevents storage?

dkliss commented 1 year ago

Maybe you have entry for your user in archive_prefs table that prevents storage?

After some investigations I fond the root cause seems to be related to paging to http://jabber.org/protocol/rsm setting in https://xmpp.org/extensions/xep-0313.html#sect-idm45750742367824. So, seems like no issues with auth etc.

And the issue or probably designed behaviour is that In both mnesia & mysql, I am receiving only first few (50 or so) messages from database and once I cross that limit I do not receive any more messages (or may be I need to send some request). And the reason for that is in stanza received below. Server somehow is sending me first 50 or so messages by default (for both MySQL and Mensia) even if I send a query for retrieving all messages.

`

270816760195391389211676275079810212` ` And I have been trying to over-ride this default behaviour by overriding server settings by sending stanza as below but no luck. Is there a way I can over ride the default page or count settings in those database. Not sure if I am missing something important. `"set">urn:xmpp:mam:210` If by default server is only expected to respond back by first few messages (or first page?), if I query for all, then the issue may in the way I am sending request. Is there a way I can setup server to send me all messages at one go? Background: I have been using MySQL for a while and hence when I mentioned I received old messages, what basically happened was, those were the very first set of messages I sent when I set up MySQL. And similarly, when i build up menisia today, it was empty so I could receive recent messages But when I sent message above 50 or so I could still receive only first 50 or so messages even with menesia.
prefiks commented 1 year ago

You can increase page size, but we limit it to 250 anyway (and by default we use 50 message pages), so you can't get all messages in one go. Are you always asking for all messages, or do you store copy locally and ask about newer message than you send previously? This is method that this module prefer.

dkliss commented 1 year ago

You can increase page size, but we limit it to 250 anyway (and by default we use 50 message pages), so you can't get all messages in one go. Are you always asking for all messages, or do you store copy locally and ask about newer message than you send previously? This is method that this module prefer.

I am always asking for all messages for now. No local copy as of now. So I guess I have to request 50 at a time?

prefiks commented 1 year ago

You can also ask for newest messages first, that way you may no need to request all 2k that way? To do that add <flip-page/> in request.

dkliss commented 1 year ago

You can also ask for newest messages first, that way you may no need to request all 2k that way? To do that add <flip-page/> in request.

Still to test flip page. I have been trying to check server against XEP-0313.

I looked at XEP-0313 for fields and cross-checked the server form fields by sending below:

Example 14. Client requests supported query fields[¶](https://xmpp.org/extensions/xep-0313.html#example-14)
<iq type='get' id='form1'>
  <query xmlns='urn:xmpp:mam:2'/>
</iq>

I received response as below. It seems like before-id, after-id, ids' etc fields are not supported (as per XEP-0313 shown below this). I was testing these fields and was getting bad request error (shown below). Is this correct or do I need to set these up in somewhere?

The support seems is limited to: querying JID, querying start and end date and search text. (Unless I need to configure this anywhere.)

<iq xml:lang='en' to='test@testserver.com/61cabb00-b3fc-11ed-9f45-37a4d953211b' from='test@testserver.com' type='result' id='YGQTWDFKW'><query xmlns='urn:xmpp:mam:2'><x type='form' xmlns='jabber:x:data'><field var='FORM_TYPE' type='hidden'><value>urn:xmpp:mam:2</value></field><field var='with' type='jid-single' label='User JID'/><field var='start' type='text-single' label='Search from the date'/><field var='end' type='text-single' label='Search until the date'/><field var='withtext' type='text-single' label='Search the text'/></x></query></iq>

Bad request error with before-id, after-id, ids'.

<bad-request xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/><text xml:lang='en' xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'>Unknown field &apos;ids&apos; of type &apos;urn:xmpp:mam:1&apos;</text></error></iq>

The XEP-0313 shows before-id, after-id, ids.

Example 15. Server returns supported fields[¶](https://xmpp.org/extensions/xep-0313.html#example-15)
<iq type='result' id='form1'>
  <query xmlns='urn:xmpp:mam:2'>
    <x xmlns='jabber:x:data' type='form'>
      <field type='hidden' var='FORM_TYPE'>
        <value>urn:xmpp:mam:2</value>
      </field>
      <field type='jid-single' var='with'/>
      <field type='text-single' var='start'/>
      <field type='text-single' var='end'/>
      <field type='text-single' var='before-id'/>
      <field type='text-single' var='after-id'/>
      <field type='list-multi' var='ids'>
        <validate xmlns="http://jabber.org/protocol/xdata-validate" datatype="xs:string">
          <open/>
        </validate>
      </field>
      <field type='boolean' var='include-groupchat'/>
      <field type='text-single' var='{http://example.com/}free-text-search'/>
      <field type='text-single' var='{http://example.com/}stanza-content'/>
    </x>
  </query>
</iq>
weiss commented 1 year ago

Check XEP-0059 for how to "page" through the result set based on MAM IDs. (And note that you can use <before/> to get the most recent page. <flip-page/> just affects the ordering within a page.)

dkliss commented 1 year ago

XEP-0059

Yes. I added flip-page and I received same set of 50 result items but reversed. I will have a look at XEP-0059. Thanks!

</x><set xmlns="http://jabber.org/protocol/rsm"><max><value>10</value></max></set><flip-page/></query></iq>

dkliss commented 1 year ago

Check XEP-0059 for how to "page" through the result set based on MAM IDs. (And note that you can use <before/> to get the most recent page. <flip-page/> just affects the ordering within a page.)

Thanks! Finally, I am able to receive recent messages when I add an empty "before" tag.

<set xmlns="http://jabber.org/protocol/rsm"><max><value>10</value></max><before/></set></query></iq>

The server however is not respecting the "max" value in the stanza. In link below, it was mentioned that max value can be set. Is this changed?

https://stackoverflow.com/questions/31828955/xmpp-query-archive-by-latest-messages

Also, another query I have is, if I need to use an "Id" for "before", which id that is, is this the id of MySQL table, which is basically starting from 1 and increment per message or is this is unique message ID added by a client (i.e. not the UID of server).

What I wanted to understand is, is there a way for me to retrieve a message from MySQL by using client ID assigned by me, which is part of message stanza or is this sth i need to solve myself?

dkliss commented 1 year ago

Check XEP-0059 for how to "page" through the result set based on MAM IDs. (And note that you can use <before/> to get the most recent page. <flip-page/> just affects the ordering within a page.)

Thanks! Finally, I am able to receive recent messages when I add an empty "before" tag.

<set xmlns="http://jabber.org/protocol/rsm"><max><value>10</value></max><before/></set></query></iq>

The server however is not respecting the "max" value in the stanza. In link below, it was mentioned that max value can be set. Is this changed?

https://stackoverflow.com/questions/31828955/xmpp-query-archive-by-latest-messages

Also, another query I have is, if I need to use an "Id" for "before", which id that is, is this the id of MySQL table, which is basically starting from 1 and increment per message or is this is unique message ID added by a client (i.e. not the UID of server).

What I wanted to understand is, is there a way for me to retrieve a message from MySQL by using client ID assigned by me, which is part of message stanza or is this sth i need to solve myself?

FYI. The "max" item issue is resolved. It was my mistake. The word "value" should not be in stanza below. After removing, I can receive the number of items equal to max (if i set to 10, i receive 10 items).

Before Fix

<set xmlns="http://jabber.org/protocol/rsm"><max><value>10</value></max><before/></set></query></iq>

After Fix

<set xmlns="http://jabber.org/protocol/rsm"><max>10</max><before/></set></query></iq>

dkliss commented 1 year ago

@prefiks @weiss Closing this one as no further issues. Thanks a lot for your help!