weaviate / typescript-client

Official Weaviate TypeScript Client
https://www.npmjs.com/package/weaviate-client
BSD 3-Clause "New" or "Revised" License
57 stars 21 forks source link

v3.0.8: queries with numeric filters always return empty sets #153

Closed tifroz closed 5 days ago

tifroz commented 6 days ago

Using the JS client v3.0.8 & weaviate.io DB in a multi-tenancy environment

Queries that include a numeric filter greaterThan, lessThan, lessEqual etc...always return an empty set no matter what.

In the sample code below, calls to the query() function always produce an empty array - which does not reflect the underlying data from the collection

async function query(className: string, tenantId: string) {
        const collection = await getCollection(className, tenantId)
        const filter = collection.filter.byProperty('response_timestamp').greaterThan(0)
        logger.debug(`OK filter -> ${JSON.stringify(filter, null, 2)}`)
    try {
        const results = await collection.query.fetchObjects({
            filters: filter,
            limit: 1
        })
        logger.debug(`OK query() -> ${JSON.stringify(results, null, 2)}`)
    } catch (err) {
        if (err.message.indexOf('tenant not found') >= 0) {
            logger.warn(`WARN Did not find tenant for class '${className}', query() will return undefined (TenantId '${tenantId}')`)
            return undefined
        } else {
            throw err
        }
    }
}

output from executing query():

OK filter -> {
  "operator": "GreaterThan",
  "target": {
    "property": "response_timestamp"
  },
  "value": 0
}

OK query -> {
  "objects": []
}
tsmith023 commented 6 days ago

Hi @tifroz, thanks for raising this one! If it isn't confidential, are you able to share your collection config (class schema) here to turn this report into a Minimum Reproducible Example (MRE)? Then I'll be able to test it locally and identify the fix! (It could also be a server-side issue depending on your setup)

If your config is confidential, can you edit it to obfuscate the data and so only contain the erroring configuration?

tifroz commented 5 days ago

I renamed some fields, and made some light changes from the original code for readability, but in essence this is the code that creates our collection schema:

const properties = [
    {
        "name": "ownerDomain",
        "dataType": dataType.TEXT,
        "indexFilterable": false,
        "indexSearchable": true,
    },
    {
        "name": "topicId",
        "dataType": dataType.TEXT,
        "indexFilterable": true,
        "indexSearchable": true,
    },
    {
        "name": "tId",
        "dataType": dataType.TEXT,
        "indexFilterable": false,
        "indexSearchable": true,
    },
    {
        "name": "text",
        "dataType": dataType.TEXT,
        "indexFilterable": false,
        "indexSearchable": false,
    },
    {
        "name": "inquiry_id",
        "dataType": dataType.TEXT,
        "indexFilterable": false,
        "indexSearchable": true,
    },
    {
        "name": "inquiry_s",
        "dataType": dataType.TEXT,
        "indexFilterable": false,
        "indexSearchable": false,
    },
    {
        "name": "inquiry_t",
        "dataType": dataType.TEXT,
        "indexFilterable": false,
        "indexSearchable": false,
    },
    {
        "name": "inquiry_cl",
        "dataType": dataType.INT,
        "indexFilterable": false,
        "indexSearchable": false,
    },
    {
        "name": "inquiry_timestamp",
        "dataType": dataType.NUMBER,
        "indexFilterable": true,
        "indexSearchable": false,
    },
    {
        "name": "response_id",
        "dataType": dataType.TEXT,
        "indexFilterable": false,
        "indexSearchable": true,
    },
    {
        "name": "response_s",
        "dataType": dataType.TEXT,
        "indexFilterable": false,
        "indexSearchable": false,
    },
    {
        "name": "response_timestamp",
        "dataType": dataType.NUMBER,
        "indexFilterable": true,
        "indexSearchable": false,
    },
    {
        "name": "response_c",
        "dataType": dataType.TEXT,
        "indexFilterable": false,
        "indexSearchable": false,
    },
    {
        "name": "response_t",
        "dataType": dataType.TEXT,
        "indexFilterable": false,
        "indexSearchable": true,
    },
    {
        "name": "response_thash",
        "dataType": dataType.TEXT,
        "indexFilterable": false,
        "indexSearchable": true,
    },
    {
        "name": "inquiry_c",
        "dataType": dataType.TEXT,
        "indexFilterable": false,
        "indexSearchable": false,
    },
    {
        "name": "inquiry_from_encryptedName",
        "dataType": dataType.TEXT,
        "indexFilterable": false,
        "indexSearchable": false,
    },
    {
        "name": "inquiry_from_hashedSrc",
        "dataType": dataType.TEXT,
        "indexFilterable": true,
        "indexSearchable": true,
    },

];

const classDef = {
    name: classType,
    vectorizers: [
        vectorizer.none({
            name: 'm'
        }),
        vectorizer.none({
            name: 'i'
        })
    ],
    multiTenancy: weaviate.configure.multiTenancy({
        enabled: true,
        autoTenantCreation: true,
        autoTenantActivation: true
        }),
    properties: properties,
}

const cli = await client()
let result = await cli.collections.create(classDef)
tsmith023 commented 5 days ago

Thanks for the added context! I've taken your code and made a test to prove this out but I cannot replicate it when using Weaviate >1.25.1 (which is the version for which autoTenantActivation and autoTenantCreation both have an effect.

Which version of the Weaviate database are you using and can you provide anymore detail on your getCollection method? That is where the tenant must be provided so there could be an issue there! Cheers 😁

This is the test that I observe passing with >1.25.1:

describe('Testing of issue #153', () => {
  const { dataType, vectorizer } = weaviate.configure;
  const collectionName = 'TestingIssue153';

  const classDef = {
    name: collectionName,
    vectorizers: [
      vectorizer.none({
        name: 'm',
      }),
      vectorizer.none({
        name: 'i',
      }),
    ],
    multiTenancy: weaviate.configure.multiTenancy({
      enabled: true,
      autoTenantCreation: true,
      autoTenantActivation: true,
    }),
    properties: [
      {
        name: 'response_timestamp',
        dataType: dataType.NUMBER,
        indexFilterable: true,
        indexSearchable: false,
      },
    ],
  };
  beforeAll(() => weaviate.connectToLocal().then((client) => client.collections.delete(collectionName)));

  it('should fail', async () => {
    const client = await weaviate.connectToLocal();
    const collection = await client.collections.create(classDef);
    const tenant = collection.withTenant('tenant1');

    const { uuids } = await tenant.data.insertMany([
      {
        response_timestamp: 1630000000000,
      },
      {
        response_timestamp: 1630000000001,
      },
    ]);

    const { objects: objs1 } = await tenant.query.fetchObjects({
      filters: collection.filter.byProperty('response_timestamp').greaterThan(1630000000000),
    });
    expect(objs1.length).toEqual(1);
    expect(objs1[0].uuid).toEqual(uuids[1]);

    const { objects: objs2 } = await tenant.query.fetchObjects({
      filters: collection.filter.byProperty('response_timestamp').greaterThan(0),
    });
    expect(objs2.length).toEqual(2);
  });
});
tifroz commented 5 days ago

We are using Weaviate 1.25.2 from Weaviate.io, and our getCollection() is as follow

async function getCollection(name: string, tenantId?: string): Promise<Collection> {
    if (tenantId) {
        return (await client()).collections.get(name).withTenant(tenantId)
    } else {
        return (await client()).collections.get(name)
    }
}

...updating the query() function to use a filter that matches a string instead will return results (see below for the string-matching filter). We are only getting unexpected empty result sets when using the numeric filter: this should rule out any tenant-related coding error

const filter = collection.filter.byProperty('tId').equal('190223a74bad92f5')

Happy to privately give you access to our Weaviate instance if that is an option?

tifroz commented 5 days ago

Apologies, this was our issue. We had started with a flat data structure, and later started using nested object structure in the schema after it became supported - we ended up with a confusing situation with duplicate attributes (in the nested structure + the flattened schema)

Long story short, some of the duplicate attributes were never populated and that was the reason for the issue.

Side question: we are considering going back to a flat structure because of the lack of support for filtering & searching nested attributes (the nested structure better reflects our data model, but we may want to be able to search/filter some of these attributes in the future). Is there any chance that support will be added in the near future for filtering & searching nested attributes?

tsmith023 commented 5 days ago

That's good to hear that you found a solution! Filtering on nested properties is currently a highly upvoted feature, as seen here, so it's highly likely to be included in either the v1.27 or v1.28 release cycles.

I will close this issue as complete given you found a solution, cheers!