typesense / typesense

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
https://typesense.org
GNU General Public License v3.0
21.04k stars 650 forks source link

Nested Left join not working as expected #1939

Open focafull opened 2 months ago

focafull commented 2 months ago

Description

When I try to use a left join in a nested join scenario together with a filter, references that don't match the filter will be included as well.

Steps to reproduce

Take these collections

 {
    'name': 'a',
    'fields': [
      {
        'name': 'title',
        'type': 'string',
        'facet': true,
      },
      {
        'name': 'vendor',
        'type': 'string',
        'facet': true,
      },
    ],
}

{
    'name': 'b',
    'fields': [
      {
        'name': 'name',
        'type': 'string',
        'facet': true,
      },
      {
        'name': 'a_ids',
        'type': 'string[]',
        'facet': true,
        'reference': 'a.id',
      },
    ],
}

{
    'name': 'c',
    'fields': [
      {
        'name': 'title',
        'type': 'string',
        'facet': true,
      },
      {
        'name': 'b_ids',
        'type': 'string[]',
        'facet': true,
        'reference': 'b.id',
      },
    ],
}

With these example documents


  const aDocs = [{
    id: '0',
    vendor: 'a',
    title: 'a1',
  }, {
    id: '1',
    vendor: 'a',
    title: 'a2',
  }, {
    id: '2',
    vendor: 'b',
    title: 'a3',
  }, {
    id: '3',
    vendor: 'b',
    title: 'a4',
  }, {
    id: '4',
    vendor: 'a',
    title: 'a5',
  }, {
    id: '5',
    vendor: 'c',
    title: 'a6',
  }, {
    id: '6',
    vendor: 'a',
    title: 'a7',
  }, {
    id: '7',
    vendor: 'b',
    title: 'a8',
  }, {
    id: '8',
    vendor: 'c',
    title: 'a9',
  }, {
    id: '9',
    vendor: 'a',
    title: 'a10',
  }];

  const bDocs = [{
    id: '0',
    name: 'b1',
    a_ids: ['8', '9'],
  }, {
    id: '1',
    name: 'b2',
    a_ids: ['5', '6', '7'],
  }, {
    id: '2',
    name: 'b3',
    a_ids: ['1', '2'],
  }, {
    id: '3',
    name: 'b4',
    a_ids: ['3', '4'],
  }, {
    id: '4',
    name: 'b5',
    a_ids: ['0'],
  }];

  const cDocs = [{
    id: '0',
    title: 'c1',
    b_ids: ['0', '2', '3', '4'],
  }, {
    id: '1',
    title: 'c2',
    b_ids: ['0', '1', '4'],
  }];

And run a query like this:

const searchRequests = {
  'searches': [
    {
      'q': '*',
      'collection': 'c',
      'query_by': 'title',
      'filter_by': '$b(id:* || $a(vendor:[c]))',
      'include_fields': '$b(*, $a(*, strategy: nest_array), strategy: nest_array,)',
      'exclude_fields': 'b_ids, id, $b(a_ids, id, $a(id))',
    },
  ],
};

const result = await client.multiSearch.perform<any>(searchRequests);

Expected Behavior

The documentation states: "So the result will include the referenced documents if a reference exists otherwise the document will be returned as is.". In this case I would expect to have all the a documents that don't match the filter not included. Something like this:

[
  {
    "b": [
      {
        "a": [{"title": "a9", "vendor": "c"}],
        "name": "b1"
      },
      {
        "a": [{"title": "a6", "vendor": "c"}],
        "name": "b2"
      },
      {
        "a": [],
        "name": "b5"
      }
    ],
    "title": "c2"
  },
  {
    "b": [
      {
        "a": [{"title": "a9", "vendor": "c"}],
        "name": "b1"
      },
      {
        "a": [],
        "name": "b3"
      },
      {
        "a": [],
        "name": "b4"
      },
      {
        "a": [],
        "name": "b5"
      }
    ],
    "title": "c1"
  }
]

Actual Behavior

But actually I get these results:

[
  {
    "b": [
      {
        "a": [{"title": "a9", "vendor": "c"}],
        "name": "b1"
      },
      {
        "a": [{"title": "a6", "vendor": "c"}],
        "name": "b2"
      },
      {
        "a": [{"title": "a1", "vendor": "a"}],
        "name": "b5"
      }
    ],
    "title": "c2"
  },
  {
    "b": [
      {
        "a": [{"title": "a9", "vendor": "c"}],
        "name": "b1"
      },
      {
        "a": [
          {"title": "a2", "vendor": "a"},
          {"title": "a3", "vendor": "b"}
        ],
        "name": "b3"
      },
      {
        "a": [
          {"title": "a4", "vendor": "b"},
          {"title": "a5", "vendor": "a"}
        ],
        "name": "b4"
      },
      {
        "a": [{"title": "a1", "vendor": "a"}],
        "name": "b5"
      }
    ],
    "title": "c1"
  }
]

When the filter vendor:[c] does not match any a document all documents from the reference will be included, which is not really helpful.

Metadata

Typesense Version: v27.0

OS: Linux

happy-san commented 3 weeks ago

I'm assuming

{
  "searches": [
    {
      "q": "*",
      "collection": "c",
      "query_by": "title",
      "filter_by": "$b($a(vendor:[c]))",
      "include_fields": "$b(*, $a(*, strategy: nest_array), strategy: nest_array,)",
      "exclude_fields": "b_ids, id, $b(a_ids, id, $a(id))"
    }
  ]
}

doesn't fulfill your purpose? It returns:

[
  {
    "document": {
      "b": [
        {
          "a": [
            {
              "title": "a9",
              "vendor": "c"
            }
          ],
          "name": "b1"
        },
        {
          "a": [
            {
              "title": "a6",
              "vendor": "c"
            }
          ],
          "name": "b2"
        }
      ],
      "title": "c2"
    },
    "highlight": {},
    "highlights": []
  },
  {
    "document": {
      "b": {
        "a": [
          {
            "title": "a9",
            "vendor": "c"
          }
        ],
        "name": "b1"
      },
      "title": "c1"
    },
    "highlight": {},
    "highlights": []
  }
]