pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 23 forks source link

Issues with running RNA velocity (La Manno) analysis in kb_python 0.28.0 #228

Closed NikTuzov closed 6 months ago

NikTuzov commented 6 months ago

Hello:

Could you please help me. Initially I used kb ref and count, in kb-python 0.27.3, as described here:

https://www.kallistobus.tools/tutorials/kb_velocity/python/kb_velocity/

and everything worked as expected. Then I switched to kb-python 0.28.0, and kb ref began to fail:

kb ref -d linnarsson -i index.idx -g t2g.txt -c1 spliced_t2c.txt -c2 unspliced_t2c.txt

File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

I found that in 0.28.0 there were some changes regarding the indexes:

https://github.com/pachterlab/kallisto-transcriptome-indices

So I downloaded the four files for human index as follows:

kb ref --workflow=nac -d human -i index.idx -g t2g.txt -c1 cdna.txt -c2 nascent.txt

and then tried to run La Manno workflow, but it failed:

kb count --h5ad -i index.idx -g t2g.txt -x 10xv2 -o SRR6470906 \
-c1 cdna.txt -c2 nascent.txt --workflow lamanno -t 8 \
SRR6470906_S1_L001_R1_001.fastq.gz \
SRR6470906_S1_L001_R2_001.fastq.gz \
SRR6470906_S1_L002_R1_001.fastq.gz \
SRR6470906_S1_L002_R2_001.fastq.gz 

kb: error: --sum incompatible with lamanno/nucleus

Does it mean that RNA velocity analysis is essentially disabled in kb-python 0.28.0? If not, how do I run it?

Regards, Nik

Yenaled commented 6 months ago

--workflow lamanno is deprecated in kb-python 0.28.0. It's called the nac index now.

See the recent documentation preprint: https://www.biorxiv.org/content/10.1101/2023.11.21.568164v1

Yenaled commented 6 months ago

See also https://github.com/pachterlab/kallisto/issues/419

NikTuzov commented 6 months ago

Hello Yenaled:

Thanks for replying. That other ticket did provide an example of the new workflow, but I have an extra question. I use the h5ad files generated by kb count as input to scvelo workflow in Python, based on this tutorial:

https://scvelo.readthedocs.io/en/latest/VelocityBasics.html#

In the previous kb version, after importing the h5ad file into Python as Anndata object, that object had two layers: "spliced" and "unspliced". Based on the new version, I expect it to have 3 layers: nascent, mature, and ambiguous. Should I assume that spliced is the same as mature and unspliced is the same as nascent? Or, maybe "spliced" should be set to ambiguous + mature and "unspliced" to nascent? Please advise.

Regards, Nik

Yenaled commented 6 months ago

Yes, the 3 layers (nascent, mature, and ambiguous) correspond to (unspliced, spliced, and ambiguous), respectively. Ideally I should have updated the nomenclature in the anndata. In any case, if you're ever unsure, you can manually inspect/use the .mtx files directly (which is how I recommend creating h5ad and loom files from).

Yenaled commented 6 months ago

@NikTuzov I updated the nomenclature to nascent, mature, and ambiguous for the layers in kb-python version 0.28.1 which I just released.