pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
147 stars 23 forks source link

Kallisto did not generate layers using rna velocity tutorial #145

Closed 230101-valentina closed 2 years ago

230101-valentina commented 3 years ago

Dear all, I am running Kallisto in python to get spliced and unspliced attributes for RNA-velocity. I used the tutorial for Constructing a velocity index with kb (https://www.kallistobus.tools/tutorials/kb_velocity_index/python/kb_velocity_index.html) for human genome, as explained.

!wget ftp://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz !wget ftp://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz

!pip install kb-python==0.25.0

!kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 8 \ Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz \ Homo_sapiens.GRCh38.104.gtf.gz

it works properly and than I use !kb count to built h5ad count matrix in this way:

!kb count --h5ad -i index.idx_cdna -g t2g.txt -x 10xv2 -o IPS_RNAvelocity_matrix \ -c1 cdna_t2c.txt -c2 intron_t2c.txt --filter bustools -t 2 \ 2_IPS_S2_L001_R1_001.fastq.gz \ 2_IPS_S2_L001_R2_001.fastq.gz \ 2_IPS_S2_L002_R1_001.fastq.gz \ 2_IPS_S2_L002_R2_001.fastq.gz

it works but when I opened the adata file it has no layers:

adata AnnData object with n_obs × n_vars = 54999 × 60664 var: 'gene_name'

Can u help me with this issue? Thanks

Lioscro commented 3 years ago

Hi, @230101-valentina, I think there is a missing --workflow lamanno in your count command. Could you try running this instead?

!kb count --h5ad -i index.idx_cdna -g t2g.txt -x 10xv2 -o IPS_RNAvelocity_matrix \
-c1 cdna_t2c.txt -c2 intron_t2c.txt --filter bustools -t 2 --workflow lamanno \
2_IPS_S2_L001_R1_001.fastq.gz \
2_IPS_S2_L001_R2_001.fastq.gz \
2_IPS_S2_L002_R1_001.fastq.gz \
2_IPS_S2_L002_R2_001.fastq.gz \
230101-valentina commented 3 years ago

Thank you, I solved the problem as you said and get the correct matrix with layers. I was wondering to know if there is a possibility to get the human t2g.txt index including also gene symbol. Thank you

Inviato da Postahttps://go.microsoft.com/fwlink/?LinkId=550986 per Windows

Da: Joseph @.> Inviato: mercoledì 15 settembre 2021 22:35 A: @.> Cc: Murtaj @.>; @.> Oggetto: Re: [pachterlab/kb_python] Kallisto did not generate layers using rna velocity tutorial (#145)

Hi, @230101-valentinahttps://github.com/230101-valentina, I think there is a missing --workflow lamanno in your count command. Could you try running this instead?

!kb count --h5ad -i index.idx_cdna -g t2g.txt -x 10xv2 -o IPS_RNAvelocity_matrix \

-c1 cdna_t2c.txt -c2 intron_t2c.txt --filter bustools -t 2 --workflow lamanno \

2_IPS_S2_L001_R1_001.fastq.gz \

2_IPS_S2_L001_R2_001.fastq.gz \

2_IPS_S2_L002_R1_001.fastq.gz \

2_IPS_S2_L002_R2_001.fastq.gz \

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pachterlab/kb_python/issues/145#issuecomment-920357442, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASWNFQXMKABAER3THBVNQ63UCD7P7ANCNFSM5EATPZ5Q.

[https://www.5xmille.org/images/img_01.png]https://www.5xmille.org/?utm_source=firmamail&utm_medium=email&utm_campaign=5xmille2021

CODICE FISCALE 07636600962 Scopri di più su www.5xmille.orghttps://www.5xmille.org/?utm_source=firmamail&utm_medium=email&utm_campaign=5xmille2020

Rispetta l’ambiente: non stampare questa mail se non è necessario. Respect the environment: print this email only if necessary.

230101-valentina commented 3 years ago

Dear, I have another question, as I found in the release page (https://github.com/pachterlab/kb_python/releases) that kb version 0.25.0 = "When the matrix is converted to H5AD or Loom format (using the --h5ad or --loom options), the gene/feature names are included as a column in the var of the anndata." This did not work with the RNAvelocity count matrix building, as the t2g.txt file did not contain the third column with gene name, right? How can I get the gene name too? If I build by myself a custom RNAvelocity index for human, with this version I would be able to get also gene names? and to visualized them in the analysis using scanpy? Thank you very much

Lioscro commented 3 years ago

Hi, @230101-valentina, I believe the t2g should contain the gene names. Are you using a version older than 0.25.0?

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days