Authentication and authorization: architecture

Proposal:

The task is to design optimal architecture for adding authentication features to Manticore.

Checklist:

^{To be completed by the assignee. Check off tasks that have been completed or are not applicable.}

- [ ] Implementation completed - [ ] Tests developed - [ ] Documentation updated - [ ] Documentation reviewed - [ ] [Changelog](https://docs.google.com/spreadsheets/d/1mz_3dRWKs86FjRF7EIZUziUDK_2Hvhd97G0pLpxo05s/edit?pli=1&gid=1102439133#gid=1102439133) updated - [x] OpenAPI YAML updated and issue created to rebuild clients

Authorization basic:

HTTP client requests header

Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=

dXNlcm5hbWU6cGFzc3dvcmQ= base64-encoded username:password

MySQL client supports basic authorization prior to version 8. MySQL interface code has some auth basic code and need to add password check. HTTP interface has no any code - need to inplement it.

auth_passed flag along with user will be stored:

at the session var for SphinxQL SSL interface
pass into daemon with each HTTPS request

Authorization with the token:

user login requests > daemon replies with the acccess token. Acccess token or session token allows daemon to identify user. After user logout all tokens invalidated. Tokens also got invalidated after a period of time.

HTTP client requests header after login

Authorization: Bearer <token>

token is stored:

at the session var for SphinxQL SSL interface
at the user and pass back with each HTTPS request

Requests

all requests from all interface SphinxQL (SSL) \ HTTPS should map into pair:

reqest type such as: read(select,meta, call), write(insert\replace\update\delete\bulk), management(create table, drop table, set var)
table name

user has allowed list of pairs (reqest types, tables names) that got checked for every request via matching or RE2 rules

The flow

the flow after the daemon got the client request basic authorization: 1) req > daemon check for user:password at the req > users[user] get the user and check that user.password matches req.password 2) req > daemon get the (reqest type,table name) pair 3) user_rules_map[user] get the allowed list that should be checked vs pair (reqest type,table name) with the direct comparsion or RE2 rules matching - like: 3.1) any match should allow to process the request further as usual 3.2) no any entries matched should reject reqest with the proper error message and error code

the flow after the daemon got the client request with token authorization: 1) req > daemon check for token at the req > active_users_map[token] get the user and check that (user,token) has valid time windows less then invalidation time 2) req > daemon get the (reqest type,table name) pair 3) user_rules_map[user] get the allowed list that should be checked vs pair (reqest type,table name) with the direct comparsion or RE2 rules matching - like: 3.1) any match should allow to process the request further as usual 3.2) no any entries matched should reject reqest with the proper error message and error code

Buddy and API communication

I suggest to reject all requests without SSL support such as HTTP \ SphinxQL \ API requests if user management enabled at daemon or ask user to keep them behind the firewall \ NAT along with the Galera interface. We could allow to pass by requests without any checks via _vip interfaces.

Daemon could pass by buddy requests:

by adding special buddy user {"reqest type":"*", "table name":"*"} with SSL generated token and check that token passed into buddy on buddy start matched with that user
check for the user-agent:Manticore Buddy - however any client with such header pass by any checks

Conserns:

API (client and master - agent) do not support SSL or user\password or token
Galera has only SSL support but not user\password or token

Not clean how to authorize Buddy or master - agent API requests:

we could add default SSL key and cert files along with searchd.int_ssl_* options to make sure that master - agent and buddy requests encrypt with ssl and replies can be dencrypted via same key and allow user to change it.
we could set custom user:password along with searchd.int_user searchd.int_password options and pass it into buddy on buddy start that buddy use for all requests into daemonand use the same for master - agent communication and allow user to change it.
we could generate custom token and pass it into buddy on buddy start that buddy use for all requests into daemon. We need inform users to change all these options to make sure these channels are sequred as default options are clear from the installation or Github source code.

Not clear how to allow buddy to performs all requests but keep the user away from certain tables, ie

user: test
has RE2 access tables rule: (?!system_table$).* - could query all tables but not the system_table

The query to the daemon will fail select * from system_table due to failed access tables rule. However if user adds select * from system_table option fuzzy=1 that fails query parsing at daemon and the raw text query got routed to buddy, the buddy could fix then issue the query and returns the result to daemon then daemon returns the result to client.

Storage

I think to store the hash user, password, allowed reqests type at the manticore.conf for static config or manticore.json for RT mode. All change of that hash (add, delete user or role or rule) should be flushed at the manticore.json.

I pushed the branch req_regex there add 100 regex patterns matching for every search query after that got enabled via

mysql -h 127.0.0.1 -P 9306 -e "set global regex=1"

and see the loop for regex matching every pattern vs SphinxQL statement text in this mode adds from 1ms initially to 0.1ms for all subsequent invocations

I tested short queries up to 128 bytes

mysql -h0 -P 9306 -e "SELECT id, uuid_short() as i101, uuid_short() as i102, uuid_short() as i103, uuid_short() as i104 from name order by id    asc;"

along with large queries up to 1kb

mysql -h0 -P 9306 -e "SELECT id, uuid_short() as i201, uuid_short() as i202, uuid_short() as i203, uuid_short() as i204
, uuid_short() as i101, uuid_short() as i102, uuid_short() as i103, uuid_short() as i104
, uuid_short() as i111, uuid_short() as i112, uuid_short() as i113, uuid_short() as i114
, uuid_short() as i121, uuid_short() as i122, uuid_short() as i123, uuid_short() as i124
, uuid_short() as i131, uuid_short() as i132, uuid_short() as i133, uuid_short() as i134
, uuid_short() as i141, uuid_short() as i142, uuid_short() as i143, uuid_short() as i144
, uuid_short() as i151, uuid_short() as i152, uuid_short() as i153, uuid_short() as i154
, uuid_short() as i161, uuid_short() as i162, uuid_short() as i163, uuid_short() as i164
, uuid_short() as i171, uuid_short() as i172, uuid_short() as i173, uuid_short() as i174
, uuid_short() as i181, uuid_short() as i182, uuid_short() as i183, uuid_short() as i184

from name order by id asc; "

the timing got logged into searchd.log after search finished well as

[Mon Nov 18 16:35:54.245 2024] [2552564] regex patterns check: 100, took: 1.054 ms
[Mon Nov 18 16:35:54.245 2024] [2552564] regex matched
[Mon Nov 18 16:38:33.945 2024] [2552579] regex patterns check: 100, took: 0.153 ms

for the users auth replication: -- could keep all rules at the system.users table and replicate it to new node on user request or when a new node first join to any cluster from the donor node. However this means that all nodes will have the same system.users and admin can not set per node rules for users.

Another concern that new node could join the cluster at the node with the users roles set but that new node that has no users roles. And as the client request to the new node goes via SphinxQL interface into new node then the new node communicate with the donor via API interface and I dont plan to add user auth to the API interface new node could bypass the auth at the donor node.

Maybe worth to think and prevent this.

As discussed, let’s carefully consider these items:

Privileges and Groups: We decided to avoid regex for performance reasons. Let’s define which specific privileges (e.g., insert, replace, delete, update) and privilege groups (e.g., write = insert + replace + delete + update) make sense for Manticore.
Privileges Table Propagation: The privileges table will be propagated to a node when it joins a cluster. We need to decide whether to include the cluster name, node name/ID/IP, along with the username, to make permissions configuration more flexible. It might be worth reviewing how MySQL handles this.
Authentication in the Binary API: As discussed, allowing authentication without encryption might be acceptable, but it's worth checking how this is done e.g. in MySQL - https://web.archive.org/web/20220412015801/https://dev.mysql.com/doc/internals/en/secure-password-authentication.html
Authentication for Queries from Buddy: Think how to ensure secure communication with Buddy, preventing anyone from impersonating Buddy to bypass authentication.
SQL Parsing Issue: Regarding the possible issue with "SELECT * FROM t OPTION blabla=1," evaluate if we can identify it as a SELECT query before routing it to Buddy.

Privileges and Groups: We decided to avoid regex for performance reasons. Let’s define which specific privileges (e.g., insert, replace, delete, update) and privilege groups (e.g., write = insert + replace + delete + update) make sense for Manticore.

Could allow to set group \ aliases in the config from the SphinxQL statement names:

searchd.users_group_read = select,show_meta,call,desc,show_profile
searchd.users_group_write_tnx = insert,replace,delete,set When use read, write_tnx group names to check of the allowed statements for user

Authentication for Queries from Buddy: Think how to ensure secure communication with Buddy, preventing anyone from impersonating Buddy to bypass authentication.

Could pass special buddy user name and generated password and buddy will send HTTP requests to daemon with the Authorization: Basic ... or generated auth token and buddy will send HTTP requests to daemon with the Authorization: Bearer <token> on the buddy start via buddy cli

SQL Parsing Issue: Regarding the possible issue with "SELECT * FROM t OPTION blabla=1," evaluate if we can identify it as a SELECT query before routing it to Buddy.

If SphinxQL query got failed to parse it still has statement set most of the time. But parser could not parse index name if the select list of the query has some error and parsing failed there.

Another approach is to pass user to buddy then buddy route that user back to daemon for every fixed request to make sure daemon will authorize all buddy requests as that user requests. Maybe there is some standart proxy request that use its own auth but also route user auth with the original request.

Tasks (estimate)

(size_M) login and rejects via MySQL client with hardcoded user to daemon
(size_M) load users from the plain config serchd.users and authenticate these users from the config instead hardcoded one
(size_S) login via HTTP(S) with the basic auth
(size_M) generate password \ token to buddy cli and authenticate own buddy requests. Need someone from the buddy team of how to check original user permission for the requests from the buddy
(size_M) add groups \ aliases as user privileges for the plain config serchd.groups and authorize MySQL and HTTP user requests vs the user \ privilege pair via raw matching \ comparison (wo regex)
(size_L) add RT mode SphinxQL statements for auth management and store\load this data in\from the manticore.json
(size_L) add user \ password and checks to all binary API communications between master and agent
(size_L) add user \ password and checks to all binary API communications between daemon and various API clients
(size_L) replicate auth table(s) via nodes. Fix user cluster join requests \ code to join auth cluster first and use auth to validate user requests

manticoresoftware / manticoresearch