pachterlab / gget

🧬 gget enables efficient querying of genomic reference databases
https://gget.bio
BSD 2-Clause "Simplified" License
940 stars 74 forks source link

gget.cellxgene TileDBError error when trying to return anndata #110

Closed rpeys closed 12 months ago

rpeys commented 12 months ago

What happened?

When running gget.cellxgene( species="homo_sapiens", meta_only=False, census_version="2023-05-15", column_names=COLUMN_NAMES, assay=PROTOCOLS, is_primary_data=True )

I get the error "RuntimeError: TileDBError: in operator syntax must be written as `attr in ['l', 'i', 's', 't']"

If I instead specify meta_only=True, it runs fine.

Here is the full output/error message:

Tue Nov  7 11:12:20 2023 INFO Fetching AnnData object from CZ CELLxGENE Discover. This might take a few minutes...
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[18], line 1
----> 1 gget.cellxgene(
      2     species="homo_sapiens", meta_only=False, census_version="2023-05-15", column_names=COLUMN_NAMES,
      3     assay=PROTOCOLS, is_primary_data=True
      4 )

File /opt/conda/rpeyser/envs/scset/lib/python3.8/site-packages/gget/gget_cellxgene.py:188, in cellxgene(species, gene, ensembl, column_names, meta_only, tissue, cell_type, development_stage, disease, sex, is_primary_data, dataset_id, tissue_general_ontology_term_id, tissue_general, assay_ontology_term_id, assay, cell_type_ontology_term_id, development_stage_ontology_term_id, disease_ontology_term_id, donor_id, self_reported_ethnicity_ontology_term_id, self_reported_ethnicity, sex_ontology_term_id, suspension_type, tissue_ontology_term_id, census_version, verbose, out)
    184     logging.info(
    185         "Fetching AnnData object from CZ CELLxGENE Discover. This might take a few minutes..."
    186     )
    187 with cellxgene_census.open_soma(census_version=census_version) as census:
--> 188     adata = cellxgene_census.get_anndata(
    189         census=census,
    190         organism=species,
    191         var_value_filter=f"{'feature_id' if ensembl else 'feature_name'} in {gene}",
    192         obs_value_filter=obs_value_filter,
    193         column_names={"obs": column_names},
    194     )
    196     if out:
    197         adata.write(out)

File /opt/conda/rpeyser/envs/scset/lib/python3.8/site-packages/cellxgene_census/_get_anndata.py:81, in get_anndata(census, organism, measurement_name, X_name, X_layers, obs_value_filter, obs_coords, var_value_filter, var_coords, column_names)
     75 var_coords = (slice(None),) if var_coords is None else (var_coords,)
     76 with exp.axis_query(
     77     measurement_name,
     78     obs_query=soma.AxisQuery(value_filter=obs_value_filter, coords=obs_coords),
     79     var_query=soma.AxisQuery(value_filter=var_value_filter, coords=var_coords),
     80 ) as query:
---> 81     return query.to_anndata(X_name=X_name, column_names=column_names, X_layers=X_layers)

File /opt/conda/rpeyser/envs/scset/lib/python3.8/site-packages/somacore/query/query.py:263, in ExperimentAxisQuery.to_anndata(self, X_name, column_names, X_layers)
    244 def to_anndata(
    245     self,
    246     X_name: str,
   (...)
    249     X_layers: Sequence[str] = (),
    250 ) -> anndata.AnnData:
    251     """
    252     Executes the query and return result as an ``AnnData`` in-memory object.
    253 
   (...)
    261     Lifecycle: maturing
    262     """
--> 263     return self._read(
    264         X_name,
    265         column_names=column_names or AxisColumnNames(obs=None, var=None),
    266         X_layers=X_layers,
    267     ).to_anndata()

File /opt/conda/rpeyser/envs/scset/lib/python3.8/site-packages/somacore/query/query.py:335, in ExperimentAxisQuery._read(self, X_name, column_names, X_layers)
    332         raise NotImplementedError("Dense array unsupported")
    333     all_x_arrays[_xname] = x_array
--> 335 obs_table, var_table = self._read_both_axes(column_names)
    337 x_matrices = {
    338     _xname: _fast_csr.read_scipy_csr(
    339         all_x_arrays[_xname], self.obs_joinids(), self.var_joinids()
    340     )
    341     for _xname in all_x_arrays
    342 }
    344 x = x_matrices.pop(X_name)

File /opt/conda/rpeyser/envs/scset/lib/python3.8/site-packages/somacore/query/query.py:362, in ExperimentAxisQuery._read_both_axes(self, column_names)
    352 obs_ft = self._threadpool.submit(
    353     self._read_axis_dataframe,
    354     _Axis.OBS,
    355     column_names,
    356 )
    357 var_ft = self._threadpool.submit(
    358     self._read_axis_dataframe,
    359     _Axis.VAR,
    360     column_names,
    361 )
--> 362 return obs_ft.result(), var_ft.result()

File /opt/conda/rpeyser/envs/scset/lib/python3.8/concurrent/futures/_base.py:437, in Future.result(self, timeout)
    435     raise CancelledError()
    436 elif self._state == FINISHED:
--> 437     return self.__get_result()
    439 self._condition.wait(timeout)
    441 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File /opt/conda/rpeyser/envs/scset/lib/python3.8/concurrent/futures/_base.py:389, in Future.__get_result(self)
    387 if self._exception:
    388     try:
--> 389         raise self._exception
    390     finally:
    391         # Break a reference cycle with the exception in self._exception
    392         self = None

File /opt/conda/rpeyser/envs/scset/lib/python3.8/concurrent/futures/thread.py:57, in _WorkItem.run(self)
     54     return
     56 try:
---> 57     result = self.fn(*self.args, **self.kwargs)
     58 except BaseException as exc:
     59     self.future.set_exception(exc)

File /opt/conda/rpeyser/envs/scset/lib/python3.8/site-packages/somacore/query/query.py:396, in ExperimentAxisQuery._read_axis_dataframe(self, axis, axis_column_names)
    393     added_soma_joinid_to_columns = True
    395 # Do the actual query.
--> 396 arrow_table = axis_df.read(
    397     axis_query.coords,
    398     value_filter=axis_query.value_filter,
    399     column_names=query_columns,
    400 ).concat()
    402 # Update the cache if needed. We can do this because no matter what
    403 # other columns are queried for, the contents of the ``soma_joinid``
    404 # column will be the same and can be safely stored.
    405 if not joinids_cached:

File /opt/conda/rpeyser/envs/scset/lib/python3.8/site-packages/tiledbsoma/_dataframe.py:341, in DataFrame.read(***failed resolving arguments***)
    338 if value_filter is not None:
    339     query_condition = QueryCondition(value_filter)
--> 341 sr = self._soma_reader(
    342     schema=schema,  # query_condition needs this
    343     column_names=column_names,
    344     query_condition=query_condition,
    345     result_order=result_order,
    346 )
    348 self._set_reader_coords(sr, coords)
    350 # TODO: platform_config
    351 # TODO: batch_size

File /opt/conda/rpeyser/envs/scset/lib/python3.8/site-packages/tiledbsoma/_tiledb_array.py:118, in TileDBArray._soma_reader(self, schema, column_names, query_condition, result_order)
    116     result_order_enum = result_order_map[ResultOrder(result_order).value]
    117     kwargs["result_order"] = result_order_enum
--> 118 return clib.SOMAArray(
    119     self.uri,
    120     name=f"{self} reader",
    121     platform_config=self._ctx.config().dict(),
    122     timestamp=(0, self.tiledb_timestamp_ms),
    123     **kwargs,
    124 )

RuntimeError: TileDBError: `in` operator syntax must be written as `attr in ['l', 'i', 's', 't']`

At:
  /opt/conda/rpeyser/envs/scset/lib/python3.8/site-packages/tiledbsoma/_query_condition.py(213): visit_Compare
  /opt/conda/rpeyser/envs/scset/lib/python3.8/ast.py(371): visit
  /opt/conda/rpeyser/envs/scset/lib/python3.8/site-packages/tiledbsoma/_query_condition.py(133): init_query_condition
  /opt/conda/rpeyser/envs/scset/lib/python3.8/site-packages/tiledbsoma/_tiledb_array.py(118): _soma_reader
  /opt/conda/rpeyser/envs/scset/lib/python3.8/site-packages/tiledbsoma/_dataframe.py(341): read
  /opt/conda/rpeyser/envs/scset/lib/python3.8/site-packages/somacore/query/query.py(396): _read_axis_dataframe
  /opt/conda/rpeyser/envs/scset/lib/python3.8/concurrent/futures/thread.py(57): run
  /opt/conda/rpeyser/envs/scset/lib/python3.8/concurrent/futures/thread.py(80): _worker
  /opt/conda/rpeyser/envs/scset/lib/python3.8/threading.py(870): run
  /opt/conda/rpeyser/envs/scset/lib/python3.8/threading.py(932): _bootstrap_inner
  /opt/conda/rpeyser/envs/scset/lib/python3.8/threading.py(890): _bootstrap

gget version

0.27.9

Operating System (OS)

Linux

User interface

Python

Are you using a computer with an Apple M1 chip?

Not M1

What is the exact command that was run?

gget.cellxgene(
    species="homo_sapiens", meta_only=False, census_version="2023-05-15", column_names=COLUMN_NAMES,
    assay=PROTOCOLS, is_primary_data=True
)

Which output/error did you get?

RuntimeError: TileDBError: `in` operator syntax must be written as `attr in ['l', 'i', 's', 't']
lauraluebbert commented 12 months ago

Hi Rebecca, thank you for reaching out. Could you please provide your COLUMN_NAMES and PROTOCOLS variables? I would like to reproduce this error so I can hopefully figure out what's going on.

rpeys commented 12 months ago

Thanks for looking into it! Let me know if you don't reproduce, and I can send more details about my environment setup.

PROTOCOLS = [ "10x 5' v2", "10x 3' v3", "10x 3' v2", "10x 5' v1", "10x 3' v1", "10x 3' transcription profiling", "10x 5' transcription profiling" ]

COLUMN_NAMES = [ "soma_joinid", "is_primary_data", "dataset_id", "donor_id", "assay", "cell_type", "development_stage", "sex", "disease", "tissue", "tissue_general" ]

lauraluebbert commented 12 months ago

Hi, I just pushed a possible fix to the main branch. If you install gget from source you should be able to test it out. Please let me know if you try it! If this works, it will be part of the next release (v0.28.0).

To install gget from source run the following commands from the terminal:

  1. Clone the gget repo: git clone https://github.com/pachterlab/gget.git
  2. Navigate into the gget folder and use pip to install the local version: cd gget && pip install .
rpeys commented 12 months ago

Yes, that fixed it! Thanks for the quick reply.