rnakato / DROMPAplus

ChIP-seq pipeline tool for quality check, normalization, statistical analysis, and visualization of multiple ChIP-seq samples.
GNU General Public License v2.0
19 stars 10 forks source link

no gene found in gtf #4

Closed jdmontenegro closed 3 years ago

jdmontenegro commented 3 years ago

Dear developers, thank you for an interesting and complete tool. We are testing this tool with many of our Chip libraries, but have been having issues obtaining the final results from our data. We have 3M paired end reads from different libraries and their control inputs and we are submitting the drompaplus pipeline in this fashion:

parse2wig+ --pair -p 8 -i Input.bam -o input48 --gt /scratch/jmontenegro/nvectensis/data/refs/nemVec2.genometable.txt -n GR --nrpm 3000000 --ncmp 3000000
parse2wig+ --pair -p 8 -i mef2.bam -o Mef2 --gt /scratch/jmontenegro/nvectensis/data/refs/nemVec2.genometable.txt -n GR --nrpm 3000000 --ncmp 3000000

The bigwig files are produced correctly and then I use those to try to generate plots. However, I get many error which I guess depend on the GTF annotation file I am using. Does drompa+ requires 'tss' features annotated to generate a TSS PROFILE graphs? After running the tool: drompa+ PROFILE -i Mef.bw -o aroundTSS -g tcs.gtf --gftype 1 --norm 1 and the tsv files look empty:

> cat aroundTSS.PROFILE.averaged.ChIPread.Mef2.100.bw.tsv
    -2500   -2400   -2300   -2200   -2100   -2000   -1900   -1800   -1700   -1600   -1500   -1400   -1300   -1200   -1100   -1000   -900    -800    -700    -600    -500    -400    -300    -200    -100    0   100 200 300 400 500 600 700 800 900 1000    1100    1200    1300    1400    1500    1600    1700    1800    1900    2000    2100    2200    2300    2400    2500

Would this be because there are no explicit TSS features in the GTF file? Also, when I try to run PC_ENRICH I get a warning that no 'genes' can be found in the genome. Nevertheless, there are gene features in the gtf file so I am not sure why it cannot find genes in any chromosome:

chr1    Stringtie   gene    33188   37834   1000    +   .   gene_id "NV2.1";
chr1    StringTie   transcript  33188   37834   1000    +   .   gene_id "NV2.1"; transcript_id "NV2.1.1"; 
chr1    StringTie   exon    33188   33328   1000    +   .   gene_id "NV2.1"; transcript_id "NV2.1.1"; exon_number "1"; 
chr1    StringTie   exon    37696   37834   1000    +   .   gene_id "NV2.1"; transcript_id "NV2.1.1"; exon_number "2"; 
chr1    Stringtie   gene    42107   54902   1000    -   .   gene_id "NV2.2";
chr1    StringTie   transcript  42107   54902   1000    -   .   gene_id "NV2.2"; transcript_id "NV2.2.1"; 
chr1    StringTie   exon    42107   42587   1000    -   .   gene_id "NV2.2"; transcript_id "NV2.2.1"; exon_number "1"; 
chr1    StringTie   exon    44060   44192   1000    -   .   gene_id "NV2.2"; transcript_id "NV2.2.1"; exon_number "2"; 
chr1    StringTie   exon    45320   45456   1000    -   .   gene_id "NV2.2"; transcript_id "NV2.2.1"; exon_number "3"; 

What are the required features that can be understood by drompa+ in a gtf file? Thank you!

rnakato commented 3 years ago

Hi jdmontenegro,

I've updated DROMPAplus to version 1.8.5 to allow a gtf file that contains only "transcript_id". Please try the latest version.