manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.93k stars 495 forks source link

JSON JOIN returns non-null when it should be null #2560

Open sanikolaev opened 2 weeks ago

sanikolaev commented 2 weeks ago

Bug Description:

With this data:

DROP TABLE IF EXISTS customers;
CREATE TABLE customers (name text, email string attribute, address text );

INSERT INTO customers (id, name, email, address) VALUES
(1, 'Alice Johnson', 'alice@example.com', '123 Maple St'),
(2, 'Bob Smith', 'bob@example.com', '456 Oak St'),
(3, 'Carol White', 'carol@example.com', '789 Pine St'),
(4, 'John Smith', 'john@example.com', '15 Barclays St');

DROP TABLE IF EXISTS orders;
CREATE TABLE orders (customer_id bigint, product text, quantity integer, order_date string, tags multi, details json );

INSERT INTO orders (id, customer_id, product, quantity, order_date, tags, details) VALUES 
(1, 1, 'Laptop', 1, '2023-01-01', (101, 102), '{"price":1200, "warranty":"2 years"}'),
(2, 2, 'Phone', 2, '2023-01-02', (103), '{"price":800, "warranty":"1 year"}'),
(3, 1, 'Tablet', 1, '2023-01-03', (101, 104), '{"price":450, "warranty":"1 year"}'),
(4, 3, 'Monitor', 1, '2023-01-04', (105), '{"price":300, "warranty":"1 year"}');

This SQL query returns all nulls:

SELECT * FROM customers
LEFT JOIN orders
ON orders.customer_id = customers.id
WHERE match('john');

+------+------------+----------------+------------------+-----------+--------------------+-----------------+-------------------+-------------+----------------+----------------+
| id   | name       | address        | email            | orders.id | orders.customer_id | orders.quantity | orders.order_date | orders.tags | orders.details | orders.product |
+------+------------+----------------+------------------+-----------+--------------------+-----------------+-------------------+-------------+----------------+----------------+
|    4 | John Smith | 15 Barclays St | john@example.com |      NULL |               NULL |            NULL | NULL              | NULL        | NULL           | NULL           |
+------+------------+----------------+------------------+-----------+--------------------+-----------------+-------------------+-------------+----------------+----------------+
1 row in set (0.00 sec)

However this JSON query returns only one null and the others are empty or zeroes.

curl -sX POST http://localhost:9308/search  -d '
{
  "index": "customers",
  "query": {
    "query_string": "john"
  },
  "join": [
    {
      "type": "left",
      "table": "orders",
      "on": [
        {
          "left": {
            "table": "orders",
            "field": "customer_id"
          },
          "operator": "eq",
          "right": {
            "table": "customers",
            "field": "id"
          }
        }
      ]
    }
  ]
}
' | jq . 
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 4,
        "_score": 1695,
        "_source": {
          "name": "John Smith",
          "address": "15 Barclays St",
          "email": "john@example.com",
          "orders.id": 0,
          "orders.customer_id": 0,
          "orders.quantity": 0,
          "orders.order_date": "",
          "orders.tags": [],
          "orders.details": null,
          "orders.product": ""
        }
      }
    ]
  }
}

This looks inconsistent. Once fixed. The corresponding example here https://github.com/manticoresoftware/manticoresearch/blob/master/manual/Searching/Joining.md should be updated.

Manticore Search Version:

Manticore 6.3.7 53ae190b8@24091306 dev (columnar 2.3.1 f9ef8b9@24090411) (secondary 2.3.1 f9ef8b9@24090411) (knn 2.3.1 f9ef8b9@24090411)

Operating System Version:

Ubuntu Jammy (dev2)

Have you tried the latest development version?

Yes

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

- [ ] Implementation completed - [ ] Tests developed - [ ] Documentation updated - [ ] Documentation reviewed - [ ] [Changelog](https://docs.google.com/spreadsheets/d/1mz_3dRWKs86FjRF7EIZUziUDK_2Hvhd97G0pLpxo05s/edit?pli=1&gid=1102439133) updated