simon-anders / htseq

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
https://htseq.readthedocs.io/en/release_0.11.1/
GNU General Public License v3.0
121 stars 77 forks source link

How to perform exon level quantifications? #40

Closed komalsrathi closed 6 years ago

komalsrathi commented 6 years ago

Hi,

This is my htseq-count command:

htseq-count -f bam \
-r name -s yes \
-a 10 -t exon -i exon_id \
-m union \
sample_Aligned.out.bam gencode.v23.annotation_subset.gff > sample_Aligned.out.table.txt

I converted Gencode v23 which looks like this:

chr1    HAVANA  gene    229431245   229434098   .   -   .   gene_id "ENSG00000143632.14"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "ACTA1"; level 2; havana_gene "OTTHUMG00000038006.3";
chr1    HAVANA  transcript  229431245   229434094   .   -   .   gene_id "ENSG00000143632.14"; transcript_id "ENST00000366684.7"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "ACTA1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "ACTA1-001"; level 2; protein_id "ENSP00000355645.3"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS1578.1"; havana_gene "OTTHUMG00000038006.3"; havana_transcript "OTTHUMT00000092781.1";
chr1    HAVANA  exon    229434004   229434094   .   -   .   gene_id "ENSG00000143632.14"; transcript_id "ENST00000366684.7"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "ACTA1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "ACTA1-001"; exon_number 1; exon_id "ENSE00001433404.2"; level 2; protein_id "ENSP00000355645.3"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS1578.1"; havana_gene "OTTHUMG00000038006.3"; havana_transcript "OTTHUMT00000092781.1";
chr1    HAVANA  exon    229432987   229433127   .   -   .   gene_id "ENSG00000143632.14"; transcript_id "ENST00000366684.7"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "ACTA1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "ACTA1-001"; exon_number 2; exon_id "ENSE00001380587.1"; level 2; protein_id "ENSP00000355645.3"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS1578.1"; havana_gene "OTTHUMG00000038006.3"; havana_transcript "OTTHUMT00000092781.1";

to gff using dexseq_prepare_annotation:

python dexseq_prepare_annotation.py gencode.v23.annotation_subset.gtf gencode.v23.annotation_subset.gff

And these are the first few lines of the gff file:

chr1    dexseq_prepare_annotation.py    aggregate_gene  229431245   229434098   .   -   .   gene_id "ENSG00000143632.14"
chr1    dexseq_prepare_annotation.py    exonic_part 229431245   229431248   .   -   .   transcripts "ENST00000366684.7"; exonic_part_number "001"; gene_id "ENSG00000143632.14"
chr1    dexseq_prepare_annotation.py    exonic_part 229431249   229431642   .   -   .   transcripts "ENST00000366683.3+ENST00000366684.7"; exonic_part_number "002"; gene_id "ENSG00000143632.14"
chr1    dexseq_prepare_annotation.py    exonic_part 229431721   229431862   .   -   .   transcripts "ENST00000366683.3+ENST00000366684.7"; exonic_part_number "003"; gene_id "ENSG00000143632.14"
chr1    dexseq_prepare_annotation.py    exonic_part 229431863   229431902   .   -   .   transcripts "ENST00000366684.7"; exonic_part_number "004"; gene_id "ENSG00000143632.14"
chr1    dexseq_prepare_annotation.py    exonic_part 229431994   229432185   .   -   .   transcripts "ENST00000366684.7"; exonic_part_number "005"; gene_id "ENSG00000143632.14"

Using this when I run my htseq-count command, all reads map to these features:

__no_feature    85273631
__ambiguous 0
__too_low_aQual 0
__not_aligned   2970335
__alignment_not_unique  12964119

How do I do exon level quantifications using htseq-count?

simon-anders commented 6 years ago

You don't. htseq-count is a tool for simple gene counting.

For anything more complex, we suggest that you use the full HTSeq programming framework and write your own custom counting script, by reading the Tutorial in the HTSeq documentation and the follow the example in the chapter "Counting reads".

komalsrathi commented 6 years ago

Thanks for the prompt response.