wososa / PSI-Sigma

PSI-Sigma
Other
35 stars 10 forks source link

GTF format #17

Closed litao1503 closed 3 years ago

litao1503 commented 3 years ago

Hello: I am confused about the correct gtf file format. When I tried to import gtf files in two formats, an error occurred.

format 1: hic_scaffold_1 . exon 131258 131658 . - . transcript_id "PH02Gene41919.t1"; gene_id "PH02Gene41919"; gene_name "PH02Gene41919"; hic_scaffold_1 . exon 134239 134451 . - . transcript_id "PH02Gene41919.t1"; gene_id "PH02Gene41919"; gene_name "PH02Gene41919"; hic_scaffold_1 . exon 134900 135166 . - . transcript_id "PH02Gene41919.t1"; gene_id "PH02Gene41919"; gene_name "PH02Gene41919"; hic_scaffold_1 . exon 137301 137498 . - . transcript_id "PH02Gene41919.t1"; gene_id "PH02Gene41919"; gene_name "PH02Gene41919"; hic_scaffold_1 . exon 137579 137872 . - . transcript_id "PH02Gene41919.t1"; gene_id "PH02Gene41919"; gene_name "PH02Gene41919"; hic_scaffold_1 . CDS 131413 131658 . - 0 transcript_id "PH02Gene41919.t1"; gene_id "PH02Gene41919"; gene_name "PH02Gene41919"; hic_scaffold_1 . CDS 134239 134451 . - 0 transcript_id "PH02Gene41919.t1"; gene_id "PH02Gene41919"; gene_name "PH02Gene41919"; hic_scaffold_1 . CDS 134900 135166 . - 0 transcript_id "PH02Gene41919.t1"; gene_id "PH02Gene41919"; gene_name "PH02Gene41919"; hic_scaffold_1 . CDS 137301 137498 . - 0 transcript_id "PH02Gene41919.t1"; gene_id "PH02Gene41919"; gene_name "PH02Gene41919"; hic_scaffold_1 . CDS 137579 137872 . - 0 transcript_id "PH02Gene41919.t1"; gene_id "PH02Gene41919"; gene_name "PH02Gene41919";

format 2: 1 . exon 131258 131658 . - . gene_id "PH02Gene41919"; transcript_id "PH02Gene41919.t1"; gene_name "PH02Gene41919"; 1 . exon 134239 134451 . - . gene_id "PH02Gene41919"; transcript_id "PH02Gene41919.t1"; gene_name "PH02Gene41919"; 1 . exon 134900 135166 . - . gene_id "PH02Gene41919"; transcript_id "PH02Gene41919.t1"; gene_name "PH02Gene41919"; 1 . exon 137301 137498 . - . gene_id "PH02Gene41919"; transcript_id "PH02Gene41919.t1"; gene_name "PH02Gene41919"; 1 . exon 137579 137872 . - . gene_id "PH02Gene41919"; transcript_id "PH02Gene41919.t1"; gene_name "PH02Gene41919"; 1 . CDS 131413 131658 . - 0 gene_id "PH02Gene41919"; transcript_id "PH02Gene41919.t1"; gene_name "PH02Gene41919"; 1 . CDS 134239 134451 . - 0 gene_id "PH02Gene41919"; transcript_id "PH02Gene41919.t1"; gene_name "PH02Gene41919"; 1 . CDS 134900 135166 . - 0 gene_id "PH02Gene41919"; transcript_id "PH02Gene41919.t1"; gene_name "PH02Gene41919"; 1 . CDS 137301 137498 . - 0 gene_id "PH02Gene41919"; transcript_id "PH02Gene41919.t1"; gene_name "PH02Gene41919"; 1 . CDS 137579 137872 . - 0 gene_id "PH02Gene41919"; transcript_id "PH02Gene41919.t1"; gene_name "PH02Gene41919";

The error produced is as follows: Hic_gffread_convert_psi.gtf is not in an acceptable gtf format. Exiting...

I want to know what kind of gtf file format is correct. In addition, can PSI-Sigma support the input of more than 2 bam files? When I try to input four bam files, only the first two files seem to work. The following is the detailed code:

echo p3B_rep1_correct.bam.sort.bam >> groupa.txt echo p3M_rep1_correct.bam.sort.bam >> groupb.txt echo p4B_rep1_correct.bam.sort.bam >> groupc.txt echo p4M_rep1_correct.bam.sort.bam >> groupd.txt perl /tools/PSI-Sigma-1.9j/dummyai.pl --gtf $gtf --name PSIsigma --type 2 -nread 5

Thank you for any guidance on this topic Regards Tao

wososa commented 3 years ago

hi @litao1503 ,

PSI-Sigma has been tested by using .gtf files from Ensembl and GENCODE. If you have novel transcripts, you can use StringTie to generate .gtf files and then merge the .gtf files with Ensembl's gif file.

PSI-Sigma only supports 2-sample comparison, so you need to put Control samples in groupa.txt and Treatment samples in groupb.txt. groupc.txt and groupd.txt won't work.

Thanks, Woody