Open jtoker opened 4 years ago
This is most likely an inconsistency in the reference genome and the gtf, namely having to do with chromosome names.
Can you show the bam header and the first few reads via
samtools view -H output/pseudoalignment.bam
and
samtools view output/pseudoalignment.bam | head
You can then check whether this matches the fasta file used in IGV.
Hi @pmelsted,
I'm having a similar issue with a custom GTF. Maybe these aren't compatible with kallisto? Or the wrong GFF version is used? Thanks for the help.
Input:
$kallisto quant -b 30 -i ../genome_annotation/HITI.idx -o 8851-Lvr -l 300 -s 1 -t 6 --single --genomebam -g ../genome_annotation/Idua_features.fix.gtf -c ../genome_annotation/chrom.sizes.txt 8851-Lvr.fq
$ head ../../genome_annotation/Idua_features.fix.gtf
HDR . CDS 1 39 . . . Parent="Exon9"; HDR . gene 1 39 . . . ID="Exon9"; name="Exon9"; HDR . CDS 253 376 . . . Parent="ExogenousseqexistintheMPS"; HDR . gene 253 376 . . . ID="ExogenousseqexistintheMPS"; name="ExogenousseqexistintheMPS"; HDR . CDS 377 648 . . . Parent="RegionofIduadonorinAAV"; HDR . gene 377 648 . . . ID="RegionofIduadonorinAAV"; name="RegionofIduadonorinAAV"; HDR . CDS 386 405 . . . Parent="SpyCas9gRNAWangetal2018"; HDR . gene 386 405 . . . ID="SpyCas9gRNAWangetal2018"; name="SpyCas9gRNAWangetal2018"; HDR . CDS 456 486 . . . Parent="Idua_g1"; HDR . gene 456 486 . . . ID="Idua_g1"; name="Idua_g1";
Annotated features as single-exon genes as described here: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md
$ head ../../genome_annotation/Idua_features.fix.gtf | cut -f9
Parent="Exon9"; ID="Exon9"; name="Exon9"; Parent="ExogenousseqexistintheMPS"; ID="ExogenousseqexistintheMPS"; name="ExogenousseqexistintheMPS"; Parent="RegionofIduadonorinAAV"; ID="RegionofIduadonorinAAV"; name="RegionofIduadonorinAAV"; Parent="SpyCas9gRNAWangetal2018"; ID="SpyCas9gRNAWangetal2018"; name="SpyCas9gRNAWangetal2018"; Parent="Idua_g1"; ID="Idua_g1"; name="Idua_g1";
Format seems to be in order
Output
$cat abundance.tsv
target_id length eff_length est_counts tpm HITI 1676 1377 3184.79 18914.3 Idua 1270 971 116310 979581 HITIas 1676 1377 253.298 1504.32 HDR 1270 971 0 0
$samtools view -H pseudoalignments.bam
VN:1.0 ID:kallisto PN:kallisto VN:0.46.1 SN:HITI LN:1676 SN:Idua LN:1270 SN:HITIas LN:1676 SN:HDR LN:1270
(removed '@' to avoid github mention)
$ samtools view pseudoalignments.bam | head
MN01027:76:000H352CF:1:11101:20952:1045 0 0 0 300M 0 0 GATACCGTCGANGGACCTAATAACTTCNTATAGCATNCATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGGCCTGGCCACGACCCATCACCCTGCAGGCTCCGCAGCGGCCTGGAGTA FFFFFFFFFFF#FFFFFFFFFFFFFFF#FFFFFFFF#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFF/////F=/F//F/////FF/F//FFFF=/F/F/A/F/F//FAF/////F///FFF////F//F/FA/6FFFF///FFFF/6=//FF=/FFA//=FF/F/FF//=F ZW:f:0 MN01027:76:000H352CF:1:11101:22935:1054 0 0 0 300M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGTA AFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF=FFFFFFFFFFFFFF=FF/FFFFFFF=FFFFFFFFFFFFFF/FF6FFFFF==FFFFFFF66FFFFFF/F6FFFFFFFF/F/AF6FFAFFFA=AFF/F/FFFF/FFFFF/AFFFFAF=FFAFFFFFFFA ZW:f:0 MN01027:76:000H352CF:1:11101:1733:1056 0 0 0 294M 0 0 GGGGGGGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAGCTTTGGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCGTACAGTGGGTGTCCTGGCGAGCACCCATCACCCGGCAGGCTCCGCAGCGGCCT F/=/F/F/AFF6FF//FFFFFFFFFF/AFAFFFAFFFFFFFFFFFAFFAFFFFFFFFA/FFFAFFFFAFFFAFFF//FFAFFA=FF/F/FFFFFFFF/FFFFFFFFFFFF=6=FFFFAFFAFFFFFFFFAF/FFF=F/A=FFFFFFFFFF//FFFFAFFFFFFFFF/66FFF=FFFF6//FFFFFFFF/F///F/6/F/FFFFFFFFF6///FFF/F=6FFFF/=FF//FF//FFF//F//F///FFAFF=/FF///FA//F/6////F//F/AA/F/FF///AFF/F6AFFFA ZW:f:0 MN01027:76:000H352CF:1:11101:13270:1057 0 0 0 300M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGTA AFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6FFFFFFFFFFFFFFFFFFFF=FFFFFFFFFFFF/F/FFFFFFFFFFF=F=AFFFFFFFFAFFFFFF=FAFFFAAAAFFFFFFFFF=FFFF/FFFFFFF=F=FAFFAFAA=FFFFFFFFFFFF ZW:f:0 MN01027:76:000H352CF:1:11101:9311:1069 0 0 0 300M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGTA FAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF/F/FFF=FFFFFFFFF6FFFAFFFFAFFF=FFFFFFFFFFFFFFFFFFAFFFAFFFFFFAFFFFF/FFFFFAFFFFFFFFFFFFF=FF= ZW:f:0 MN01027:76:000H352CF:1:11101:18442:1077 0 0 0 300M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCCTACAGTGGGTGTCCTGGCCAGCACCCATCCCCCTGAAGGCTCCGCAGCGGCCTGGAGTACC A/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6AFFFFFFFFF=FFFFF/6AF/FFFFFAFFFF=FFFAFFFFFFFFAFFFFFFFFFF//FAF=FFF/FFFFFFFFFFFAFF/FF=AFFFFA==FA=FAFAFF=FFAFFFFAFFF/FFF/A///A ZW:f:0 MN01027:76:000H352CF:1:11101:14705:1077 0 0 0 300M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGTA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFAFFFF=FFFFFFFFFFFFF=FFFFFFFFFFFFFFF=FFFF=F=FAF===AF=FFFFFFF6F66FFFFFF/AFFF/FFFFFFFF=/FFFFFFFAFF6=FFFFFF/FFF ZW:f:0 MN01027:76:000H352CF:1:11101:13141:1078 0 0 0 300M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGTA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFF=FFFFFFFAFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFF/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFF/FFFFFFFF6FFFFFFFFFAFFFFFAFFF=FF= ZW:f:0 MN01027:76:000H352CF:1:11101:6146:1086 0 0 0 299M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACAGCAATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGT AFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFAFFFAFFFFFFFFFF/FFFFFFFFFFF=6FFFFFFFFF/FFFFFFFFFAFFFFAFFF/FFFFFFFF/FFFFFF6AFFFFFFFFFFFF=/6=6AFFFFFFFFFAFAFFFFF ZW:f:0 MN01027:76:000H352CF:1:11101:15230:1087 0 0 0 299M 0 0 GATACCGTCGAGGGACCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTAAGGGTTATTGAATATGATCGGAATTCCTCGAGCGGCCTACAAATGGTGGGAGCTAGATATTAGGGTAGGAAGCCAGATGCTAGGTATGAGAGAGCCAACAGCCTCAGCCCTCTGCTTGGCTTATAGATGGAGAACAACTCTAGGCAGAGGTCTCAAAGGCTGGGGCTGTGTTGGACCGCGATCATACAGTGGGTGTCCTGGCCAGCACCCATCACCCTGAAGGCTCCGCAGCGGCCTGGAGT A/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFF=FFAFFFFFFFFFFFF=/FFFFFFFF/FFFFFF/FFFF/AFFFFFFFFFFFFFFFFFFFFAFFFAF=FFFFF6FFFF=FFAF/FF/AFFF=FA/FFF/FFFFFF=F=AF6FF/FFA/=FFFFFFF==FFFAFFFFFFFFFFFFFF/FF ZW:f:0
I'm trying to use kallisto to visualize read alignment in IGV. I'm using the ensembl FASTA for non-coding RNAs and the ensembl GTF. Initially the GTF transcript IDs didn't match the FASTA IDs (because FASTA includes version number with a decimal), but I fixed that and the problem still persists. Chromosome names also match. One possible issue is that the reference genome in IGV is not from ensembl. Could that be the issue? The ensembl reference genome takes up a prohibitively large amount of memory on my computer, so I haven't been able to tell...
! kallisto index -i transcripts.idx new_Homo_sapiens.GRCh38.ncrna.fa
! kallisto quant --chromosomes new_hg38.chrom.sizes.txt --genomebam --gtf Homo_sapiens.GRCh38.99.chr.gtf.gz -i transcripts.idx -o output 1.fastq 2.fastq
I successfully get the count data, but it'd be great if I could also see the alignment. It worked with --pseudobam when I aligned to the non-coding RNA FASTA, but --genomebam isn't showing any reads in IGV.