pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 24 forks source link

Unable to read kb count output loom file to Velocyto.R #182

Closed stanaka6 closed 1 year ago

stanaka6 commented 1 year ago

Hi,

This is the similar issue to #112 and a SeuratWrapper issue, but there is no solution. My kb count output loom file seems cell names (col_attrs/CellID) are missing. I need to load this loom file for running Velocyto.R or SeuratWrapper. Could you please help me to solve this issue?

When I load kb count output loom file using ReadVelocity (SeuratWrapper) or read.loom.matrices (Velocyte.R), I get the following error messages:

> ldat <- read.loom.matrices("/My_KB_output/counts_unfiltered/adata.loom")
reading loom file via hdf5r...
Error in `[[.H5File`(f, "col_attrs/CellID") : 
  An object with name col_attrs/CellID does not exist in this group

> ldat <- ReadVelocity("/My_KB_output/counts_unfiltered/adata.loom")
reading loom file via hdf5r...
Error in `[[.H5File`(f, "col_attrs/CellID") : 
  An object with name col_attrs/CellID does not exist in this group

I generated loom file by using the kb count code below:

kb count --loom -i index.idx -g t2g.txt -x 10xv3 -c1 cdna_t2c.txt -c2 intron_t2c.txt --lamanno \
XXXXXXXXX_S1_L001_R1_001.fastq.gz XXXXXXXXX_S1_L001_R2_001.fastq.gz

I inspected my counts_unfiltered/adata.loom by loomR R package, and confirmed missing col_attrs/cell_names values.

> ldat = connect(filename = "/My_KB_output/counts_unfiltered/adata.loom", mode='r+', skip.validate = TRUE) 
Warning message:E) 
In initialize(...) :
  Skipping validation step, some fields are not populated

> ldat
Class: loom
Filename: /My_KB_output/counts_unfiltered/adata.loomAccess type: H5F_ACC_RDWR
Attributes: last_modified, version
Listing:
       name    obj_type   dataset.dims dataset.type_class
      attrs   H5I_GROUP           <NA>               <NA>
  col_attrs   H5I_GROUP           <NA>               <NA>
 col_graphs   H5I_GROUP           <NA>               <NA>
     layers   H5I_GROUP           <NA>               <NA>
     matrix H5I_DATASET 283212 x 55421          H5T_FLOAT
  row_attrs   H5I_GROUP           <NA>               <NA>
 row_graphs   H5I_GROUP           <NA>               <NA>

> ldat[["col_attrs/cell_names"]]
Error in `[[.H5File`(ldat, "col_attrs/cell_names") : 
  An object with name col_attrs/cell_names does not exist in this group

> ldat[["col_attrs"]]
Class: H5Group
Filename: /My_KB_output/counts_unfiltered/adata.loom
Group: /col_attrs
Attributes: last_modified
Listing:
      name    obj_type dataset.dims dataset.type_class
 obs_names H5I_DATASET       283212         H5T_STRING

> ldat[["col_attrs/obs_names"]]
Class: H5D
Dataset: /col_attrs/obs_names
Filename: /My_KB_output/counts_unfiltered/adata.loom
Access type: H5F_ACC_RDWR
Attributes: last_modified
Datatype: H5T_STRING {
      STRSIZE H5T_VARIABLE;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_UTF8;
      CTYPE H5T_C_S1;
   }
Space: Type=Simple     Dims=283212     Maxdims=283212
Not chunked

> ldat$row.attrs$gene_names
NULL

> ldat[["matrix"]]
Class: H5D
Dataset: /matrix
Filename: /My_KB_output/counts_unfiltered/adata.loom
Access type: H5F_ACC_RDWR
Datatype: H5T_IEEE_F32LE
Space: Type=Simple     Dims=283212 x 55421     Maxdims=Inf x 55421
Chunk: 64 x 64

kb info kb_python 0.27.3 kallisto: 0.48.0 bustools: 0.41.0

Should I modify kb count code to get correct loom output? Any suggestions would be appreciated.

Thank you very much!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

Yenaled commented 1 year ago

I'm not sure why this is. For now, I guess the best solution is to manually read in the spliced+unspliced matrices. I'll keep this on my radar for trying to fix this loom file issue. Tagging @Lioscro (who first developed and is now helping me maintain this package).

Yenaled commented 1 year ago

OK, I understand: utils.py (import_matrix_as_anndata) assigns the names to be f'{name}_name}' (aka gene_name) and barcode whereas software like Velocyto are expecting the names to be CellID and Gene.

You can modify the kb code (utils.py) to assign those names, or you can modify the loom data to rename those names.

In a future release, we'll make a --loom-names option available so you can manually input whatever names required by downstream tools.

BenjaminDEMAILLE commented 1 year ago

I can't also open the H5ad. a solutionnassions for R is : https://bustools.github.io/BUS_notebooks_R/velocity.html#qc