laminlabs / lamindb

A data framework for biology.
https://docs.lamin.ai
Apache License 2.0
129 stars 12 forks source link

🚸 Passing anything other than a Q expression as an arg to filter should error, not return a random queryset #2206

Open falexwolf opened 2 days ago

falexwolf commented 2 days ago

Why does this happen?

>>> ln.Transform.filter("1y7UO5uJgJCx0000")
<QuerySet [Transform(uid='13VINnFk89PE0000', is_latest=False, name='McFarland 2020 dataset preprocessing for perturbation use case', key='mcfarland_2020_preparation.ipynb', type='notebook', hash='6hD6CcxCFSURI9Ce312jWA', created_by_id=6, created_at=2024-09-25 11:34:06 UTC), Transform(uid='ManDYgmftZ8C0003', is_latest=True, name='Standardize and append a dataset', key='scrna2.ipynb', type='notebook', hash='1ap7gyyu30aYf8BmF9V7yw', created_by_id=9, created_at=2024-10-18 13:09:50 UTC), Transform(uid='PPfgq7yIVWzX0000', is_latest=False, name='Artifact.R', key='Artifact.R', type='script', created_by_id=30, created_at=2024-11-18 14:18:21 UTC), Transform(uid='jgTrkoeuxAfs0000', is_latest=False, name='Passing large JSONs as run parameters', key='2024-10-analyze-json-params.ipynb', type='notebook', hash='A5rr8rLQu0eDYgzjuyQueQ', created_by_id=9, created_at=2024-10-02 15:28:22 UTC), Transform(uid='jgTrkoeuxAfs0001', is_latest=False, name='Understanding dictionary-like run paramaters', key='2024-10-analyze-json-params.ipynb', type='notebook', hash='pWmZNHGtVhMA0-QpMiPgCA', created_by_id=9, created_at=2024-10-02 22:28:01 UTC), Transform(uid='cpMwOcY2YJ5G0000', version='2.7.1', is_latest=True, name='scrna-seq', type='pipeline', reference='https://github.com/nf-core/scrnaseq', created_by_id=6, created_at=2024-10-03 07:31:25 UTC), Transform(uid='13VINnFk89PE0004', is_latest=False, name='McFarland 2020 dataset preprocessing for perturbation use case', key='mcfarland_2020_preparation.ipynb', type='notebook', hash='3isej0kIz3jVLWV9InEJ3w', created_by_id=6, created_at=2024-09-25 19:46:22 UTC), Transform(uid='13VINnFk89PE0006', is_latest=True, name='McFarland 2020 dataset preprocessing for perturbation use case', key='mcfarland_2020_preparation.ipynb', type='notebook', hash='KZbkr_rMgvYuejoqf2gU5A', created_by_id=6, created_at=2024-10-03 14:52:20 UTC), Transform(uid='4p2CNy60f3CR0001', is_latest=False, name='cellxgene_basic.Rmd', key='cellxgene_basic.Rmd', type='script', hash='UN-meCOpWuYO7nZHgJGjRg', created_by_id=28, created_at=2024-11-20 10:44:54 UTC), Transform(uid='I8BlHXFXqZOG0002', is_latest=True, name='example_workflow.Rmd', key='example_workflow.Rmd', type='script', hash='lmvDwY2SbMnjRUhNAXQG7g', created_by_id=28, created_at=2024-11-20 13:11:12 UTC), Transform(uid='Nv48yAceNSh85zKv', version='1', is_latest=False, name='scRNA-seq', key='scrna.ipynb', type='notebook', _source_code_artifact_id=450, created_by_id=9, created_at=2024-01-03 00:20:15 UTC), Transform(uid='PtTXoc0RbOIqFn', version='1', is_latest=False, name='Hit identification - genome-wide CRIPSRa IFNG screen of T cells', key='2023-08-25-analyze-assay', type='notebook', created_by_id=9, created_at=2023-08-25 20:15:41 UTC), Transform(uid='ManDYgmftZ8Cz8', version='0', is_latest=False, name='Append a new batch of data', key='scrna2.ipynb', type='notebook', reference='https://lamin.ai/docs/scrna2', reference_type='lamin-usecases', _source_code_artifact_id=420, created_by_id=2, created_at=2023-10-04 13:01:45 UTC), Transform(uid='LTpoJdAFjxnT0000', is_latest=True, name='The number of genes measured for each artifact', key='n-genes-per-artifact.ipynb', type='notebook', hash='_Tw4B8zEebj_l2dtczGGdg', created_by_id=9, created_at=2024-11-22 15:37:32 UTC), Transform(uid='4p2CNy60f3CR0002', is_latest=False, name='cellxgene_basic.Rmd', key='cellxgene_basic.Rmd', type='script', hash='-7IA4g43zIQ3PeiyGrxvsA', created_by_id=28, created_at=2024-11-20 10:48:48 UTC), Transform(uid='kzsEMao0vnuo0000', is_latest=False, name='demo_report.Rmd', key='demo_report.Rmd', type='script', hash='tKVz6RXNJOFuLejDho8lNg', created_by_id=28, created_at=2024-11-20 13:34:13 UTC), Transform(uid='3qIZOmDUVBwc0000', is_latest=True, name='Demo', key='demo.ipynb', type='notebook', hash='KeC8iNnQjhe2p9IirA_dHQ', created_by_id=9, created_at=2024-09-24 13:08:57 UTC), Transform(uid='emQth4BTxSiE0000', is_latest=False, name='Benchmark `connect_instance_hub`', key='check_edge.ipynb', type='notebook', hash='HuXSgpORTEX_RoEGQhpZ7A', created_by_id=9, created_at=2024-09-30 13:10:41 UTC), Transform(uid='qJnKZxV8e0l20000', is_latest=True, name='Clean up notebook labels', key='cleaning.ipynb', type='notebook', hash='UDUvNqlcZOvTaOL8GSXvyA', created_by_id=9, created_at=2024-10-02 21:20:38 UTC), Transform(uid='jgTrkoeuxAfs0002', is_latest=False, name='Passing large JSONs as run parameters', key='2024-10-analyze-json-params.ipynb', type='notebook', hash='F_LOQuQvX_6m-Z7binh9Zw', created_by_id=9, created_at=2024-10-02 23:36:41 UTC), '...(remaining elements truncated)...']>

Doesn't happen on the Django level

>>> ln.Transform.objects.filter("1y7UO5uJgJCx0000")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/falexwolf/miniconda3/envs/py310/lib/python3.10/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/Users/falexwolf/miniconda3/envs/py310/lib/python3.10/site-packages/django/db/models/query.py", line 1476, in filter
    return self._filter_or_exclude(False, args, kwargs)
  File "/Users/falexwolf/miniconda3/envs/py310/lib/python3.10/site-packages/django/db/models/query.py", line 1494, in _filter_or_exclude
    clone._filter_or_exclude_inplace(negate, args, kwargs)
  File "/Users/falexwolf/miniconda3/envs/py310/lib/python3.10/site-packages/django/db/models/query.py", line 1501, in _filter_or_exclude_inplace
    self._query.add_q(Q(*args, **kwargs))
  File "/Users/falexwolf/miniconda3/envs/py310/lib/python3.10/site-packages/django/db/models/sql/query.py", line 1609, in add_q
    clause, _ = self._add_q(q_object, self.used_aliases)
  File "/Users/falexwolf/miniconda3/envs/py310/lib/python3.10/site-packages/django/db/models/sql/query.py", line 1641, in _add_q
    child_clause, needed_inner = self.build_filter(
  File "/Users/falexwolf/miniconda3/envs/py310/lib/python3.10/site-packages/django/db/models/sql/query.py", line 1488, in build_filter
    arg, value = filter_expr
ValueError: too many values to unpack (expected 2)
Koncopd commented 2 days ago

expression have length zero here https://github.com/laminlabs/lamindb/blob/ade8f08367a082969b9b1b35d871039bf5285240/lamindb/_query_set.py#L325 so filter returns self, but self here is the the query set constructed so https://github.com/laminlabs/lamindb/blob/ade8f08367a082969b9b1b35d871039bf5285240/lamindb/_record.py#L246 QuerySet(model=cls, using=_using_key) I think it is equivalent to Record.objects.filter()

falexwolf commented 2 days ago

Thanks for clarifying! The whole thing should error and ask for passing an expression via Q or kwargs.

Koncopd commented 2 days ago

Should i fix this?

falexwolf commented 2 days ago

You're very welcome to fix it, of course! :D