pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
648 stars 170 forks source link

--genomebam not showing read alignment in IGV #258

Open jtoker opened 4 years ago

jtoker commented 4 years ago

I'm trying to use kallisto to visualize read alignment in IGV. I'm using the ensembl FASTA for non-coding RNAs and the ensembl GTF. Initially the GTF transcript IDs didn't match the FASTA IDs (because FASTA includes version number with a decimal), but I fixed that and the problem still persists. Chromosome names also match. One possible issue is that the reference genome in IGV is not from ensembl. Could that be the issue? The ensembl reference genome takes up a prohibitively large amount of memory on my computer, so I haven't been able to tell...

! kallisto index -i transcripts.idx new_Homo_sapiens.GRCh38.ncrna.fa

! kallisto quant --chromosomes new_hg38.chrom.sizes.txt --genomebam --gtf Homo_sapiens.GRCh38.99.chr.gtf.gz -i transcripts.idx -o output 1.fastq 2.fastq

I successfully get the count data, but it'd be great if I could also see the alignment. It worked with --pseudobam when I aligned to the non-coding RNA FASTA, but --genomebam isn't showing any reads in IGV.

pmelsted commented 4 years ago

This is most likely an inconsistency in the reference genome and the gtf, namely having to do with chromosome names.

Can you show the bam header and the first few reads via

samtools view -H output/pseudoalignment.bam

and

samtools view output/pseudoalignment.bam | head 

You can then check whether this matches the fasta file used in IGV.

umasstr commented 3 years ago

Hi @pmelsted,

I'm having a similar issue with a custom GTF. Maybe these aren't compatible with kallisto? Or the wrong GFF version is used? Thanks for the help.

Input: $kallisto quant -b 30 -i ../genome_annotation/HITI.idx -o 8851-Lvr -l 300 -s 1 -t 6 --single --genomebam -g ../genome_annotation/Idua_features.fix.gtf -c ../genome_annotation/chrom.sizes.txt 8851-Lvr.fq

$ head ../../genome_annotation/Idua_features.fix.gtf

HDR . CDS 1 39 . . . Parent="Exon9"; HDR . gene 1 39 . . . ID="Exon9"; name="Exon9"; HDR . CDS 253 376 . . . Parent="ExogenousseqexistintheMPS"; HDR . gene 253 376 . . . ID="ExogenousseqexistintheMPS"; name="ExogenousseqexistintheMPS"; HDR . CDS 377 648 . . . Parent="RegionofIduadonorinAAV"; HDR . gene 377 648 . . . ID="RegionofIduadonorinAAV"; name="RegionofIduadonorinAAV"; HDR . CDS 386 405 . . . Parent="SpyCas9gRNAWangetal2018"; HDR . gene 386 405 . . . ID="SpyCas9gRNAWangetal2018"; name="SpyCas9gRNAWangetal2018"; HDR . CDS 456 486 . . . Parent="Idua_g1"; HDR . gene 456 486 . . . ID="Idua_g1"; name="Idua_g1";

Annotated features as single-exon genes as described here: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

$ head ../../genome_annotation/Idua_features.fix.gtf | cut -f9

Parent="Exon9"; ID="Exon9"; name="Exon9"; Parent="ExogenousseqexistintheMPS"; ID="ExogenousseqexistintheMPS"; name="ExogenousseqexistintheMPS"; Parent="RegionofIduadonorinAAV"; ID="RegionofIduadonorinAAV"; name="RegionofIduadonorinAAV"; Parent="SpyCas9gRNAWangetal2018"; ID="SpyCas9gRNAWangetal2018"; name="SpyCas9gRNAWangetal2018"; Parent="Idua_g1"; ID="Idua_g1"; name="Idua_g1";

Format seems to be in order

Output $cat abundance.tsv

target_id length eff_length est_counts tpm HITI 1676 1377 3184.79 18914.3 Idua 1270 971 116310 979581 HITIas 1676 1377 253.298 1504.32 HDR 1270 971 0 0

$samtools view -H pseudoalignments.bam

 VN:1.0
 ID:kallisto     PN:kallisto     VN:0.46.1
 SN:HITI LN:1676
 SN:Idua LN:1270
 SN:HITIas       LN:1676
 SN:HDR  LN:1270

(removed '@' to avoid github mention)

$ samtools view pseudoalignments.bam | head

MN01027:76:000H352CF:1:11101:20952:1045 0 0 0 300M 0 0 GATACCGTCGANGGACCTAATAACTTCNTATAGCATNCATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGGCCTGGCCACGACCCATCACCCTGCAGGCTCCGCAGCGGCCTGGAGTA FFFFFFFFFFF#FFFFFFFFFFFFFFF#FFFFFFFF#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFF/////F=/F//F/////FF/F//FFFF=/F/F/A/F/F//FAF/////F///FFF////F//F/FA/6FFFF///FFFF/6=//FF=/FFA//=FF/F/FF//=F ZW:f:0 MN01027:76:000H352CF:1:11101:22935:1054 0 0 0 300M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGTA AFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF=FFFFFFFFFFFFFF=FF/FFFFFFF=FFFFFFFFFFFFFF/FF6FFFFF==FFFFFFF66FFFFFF/F6FFFFFFFF/F/AF6FFAFFFA=AFF/F/FFFF/FFFFF/AFFFFAF=FFAFFFFFFFA ZW:f:0 MN01027:76:000H352CF:1:11101:1733:1056 0 0 0 294M 0 0 GGGGGGGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAGCTTTGGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCGTACAGTGGGTGTCCTGGCGAGCACCCATCACCCGGCAGGCTCCGCAGCGGCCT F/=/F/F/AFF6FF//FFFFFFFFFF/AFAFFFAFFFFFFFFFFFAFFAFFFFFFFFA/FFFAFFFFAFFFAFFF//FFAFFA=FF/F/FFFFFFFF/FFFFFFFFFFFF=6=FFFFAFFAFFFFFFFFAF/FFF=F/A=FFFFFFFFFF//FFFFAFFFFFFFFF/66FFF=FFFF6//FFFFFFFF/F///F/6/F/FFFFFFFFF6///FFF/F=6FFFF/=FF//FF//FFF//F//F///FFAFF=/FF///FA//F/6////F//F/AA/F/FF///AFF/F6AFFFA ZW:f:0 MN01027:76:000H352CF:1:11101:13270:1057 0 0 0 300M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGTA AFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6FFFFFFFFFFFFFFFFFFFF=FFFFFFFFFFFF/F/FFFFFFFFFFF=F=AFFFFFFFFAFFFFFF=FAFFFAAAAFFFFFFFFF=FFFF/FFFFFFF=F=FAFFAFAA=FFFFFFFFFFFF ZW:f:0 MN01027:76:000H352CF:1:11101:9311:1069 0 0 0 300M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGTA FAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF/F/FFF=FFFFFFFFF6FFFAFFFFAFFF=FFFFFFFFFFFFFFFFFFAFFFAFFFFFFAFFFFF/FFFFFAFFFFFFFFFFFFF=FF= ZW:f:0 MN01027:76:000H352CF:1:11101:18442:1077 0 0 0 300M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCCTACAGTGGGTGTCCTGGCCAGCACCCATCCCCCTGAAGGCTCCGCAGCGGCCTGGAGTACC A/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6AFFFFFFFFF=FFFFF/6AF/FFFFFAFFFF=FFFAFFFFFFFFAFFFFFFFFFF//FAF=FFF/FFFFFFFFFFFAFF/FF=AFFFFA==FA=FAFAFF=FFAFFFFAFFF/FFF/A///A ZW:f:0 MN01027:76:000H352CF:1:11101:14705:1077 0 0 0 300M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGTA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFAFFFF=FFFFFFFFFFFFF=FFFFFFFFFFFFFFF=FFFF=F=FAF===AF=FFFFFFF6F66FFFFFF/AFFF/FFFFFFFF=/FFFFFFFAFF6=FFFFFF/FFF ZW:f:0 MN01027:76:000H352CF:1:11101:13141:1078 0 0 0 300M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGTA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFF=FFFFFFFAFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFF/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFF/FFFFFFFF6FFFFFFFFFAFFFFFAFFF=FF= ZW:f:0 MN01027:76:000H352CF:1:11101:6146:1086 0 0 0 299M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGT AFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFAFFFAFFFFFFFFFF/FFFFFFFFFFF=6FFFFFFFFF/FFFFFFFFFAFFFFAFFF/FFFFFFFF/FFFFFF6AFFFFFFFFFFFF=/6=6AFFFFFFFFFAFAFFFFF ZW:f:0 MN01027:76:000H352CF:1:11101:15230:1087 0 0 0 299M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACCGCGATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGT A/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFF=FFAFFFFFFFFFFFF=/FFFFFFFF/FFFFFF/FFFF/AFFFFFFFFFFFFFFFFFFFFAFFFAF=FFFFF6FFFF=FFAF/FF/AFFF=FA/FFF/FFFFFF=F=AF6FF/FFA/=FFFFFFF==FFFAFFFFFFFFFFFFFF/FF ZW:f:0