nf-core / rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
https://nf-co.re/rnaseq
MIT License
886 stars 702 forks source link

To use STAR module, how should the GTF file look like #1347

Open huma1995 opened 2 months ago

huma1995 commented 2 months ago

Currently the NF1 GTF File matches the NM_001042492.2.fasta file. However, I am still getting this error : ERROR ~ Error executing process > 'STAR_GENOMEGENERATE (NM_001042492.2.fasta)'

Caused by: Process STAR_GENOMEGENERATE (NM_001042492.2.fasta) terminated with an error exit status (104)

Command executed:

samtools faidx NM_001042492.2.fasta NUM_BASES=gawk '{sum = sum + $2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' NM_001042492.2.fasta.fai

mkdir star STAR \ --runMode genomeGenerate \ --genomeDir star/ \ --genomeFastaFiles NM_001042492.2.fasta \ --sjdbGTFfile NF1.gtf \ --runThreadN 4 \ --genomeSAindexNbases $NUM_BASES \ --limitGenomeGenerateRAM 17079869184 \

cat <<-END_VERSIONS > versions.yml "STARGENOMEGENERATE": star: $(STAR --version | sed -e "s/STAR//g") samtools: $(echo $(samtools --version 2>&1) | sed 's/^.samtools //; s/Using.$//') gawk: $(echo $(gawk --version 2>&1) | sed 's/^.GNU Awk //; s/, .$//') END_VERSIONS

Command exit status: 104

Command output: STAR --runMode genomeGenerate --genomeDir star/ --genomeFastaFiles NM_001042492.2.fasta --sjdbGTFfile NF1.gtf --runThreadN 4 --genomeSAindexNbases 6 --limitGenomeGenerateRAM 17079869184 STAR version: 2.7.9a compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source Jul 24 16:03:26 ..... started STAR run Jul 24 16:03:26 ... starting to generate Genome files Jul 24 16:03:26 ..... processing annotations GTF

Command error:

!!!!! WARNING: while processing sjdbGTFfile=NF1.gtf, line: 17 hg19_refGene exon 29670027 29670153 0.000000 + . gene_id "NM_000267"; transcript_id "NM_000267"; exon end = 29670153 is larger than the chromosome 17 length = 8520 , will skip this exon

!!!!! WARNING: while processing sjdbGTFfile=NF1.gtf, line: 17 hg19_refGene exon 29676138 29676269 0.000000 + . gene_id "NM_000267"; transcript_id "NM_000267"; exon end = 29676269 is larger than the chromosome 17 length = 8520 , will skip this exon

!!!!! WARNING: while processing sjdbGTFfile=NF1.gtf, line: 17 hg19_refGene exon 29677201 29677336 0.000000 + . gene_id "NM_000267"; transcript_id "NM_000267"; exon end = 29677336 is larger than the chromosome 17 length = 8520 , will skip this exon

!!!!! WARNING: while processing sjdbGTFfile=NF1.gtf, line: 17 hg19_refGene exon 29679275 29679432 0.000000 + . gene_id "NM_000267"; transcript_id "NM_000267"; exon end = 29679432 is larger than the chromosome 17 length = 8520 , will skip this exon

!!!!! WARNING: while processing sjdbGTFfile=NF1.gtf, line: 17 hg19_refGene exon 29683478 29683600 0.000000 + . gene_id "NM_000267"; transcript_id "NM_000267"; exon end = 29683600 is larger than the chromosome 17 length = 8520 , will skip this exon

!!!!! WARNING: while processing sjdbGTFfile=NF1.gtf, line: 17 hg19_refGene exon 29683978 29684108 0.000000 + . gene_id "NM_000267"; transcript_id "NM_000267"; exon end = 29684108 is larger than the chromosome 17 length = 8520 , will skip this exon

!!!!! WARNING: while processing sjdbGTFfile=NF1.gtf, line: 17 hg19_refGene exon 29684287 29684387 0.000000 + . gene_id "NM_000267"; transcript_id "NM_000267"; exon end = 29684387 is larger than the chromosome 17 length = 8520 , will skip this exon

!!!!! WARNING: while processing sjdbGTFfile=NF1.gtf, line: 17 hg19_refGene exon 29685498 29685640 0.000000 + . gene_id "NM_000267"; transcript_id "NM_000267"; exon end = 29685640 is larger than the chromosome 17 length = 8520 , will skip this exon

!!!!! WARNING: while processing sjdbGTFfile=NF1.gtf, line: 17 hg19_refGene exon 29685987 29686033 0.000000 + . gene_id "NM_000267"; transcript_id "NM_000267"; exon end = 29686033 is larger than the chromosome 17 length = 8520 , will skip this exon

!!!!! WARNING: while processing sjdbGTFfile=NF1.gtf, line: 17 hg19_refGene exon 29687505 29687721 0.000000 + . gene_id "NM_000267"; transcript_id "NM_000267"; exon end = 29687721 is larger than the chromosome 17 length = 8520 , will skip this exon

!!!!! WARNING: while processing sjdbGTFfile=NF1.gtf, line: 17 hg19_refGene exon 29701031 29704695 0.000000 + . gene_id "NM_000267"; transcript_id "NM_000267"; exon end = 29704695 is larger than the chromosome 17 length = 8520 , will skip this exon

Fatal INPUT FILE error, no valid exon lines in the GTF file: NF1.gtf Solution: check the formatting of the GTF file. One likely cause is the difference in chromosome naming between GTF and FASTA file.

Jul 24 16:03:26 ...... FATAL ERROR, exiting

Work dir: /home/hz1/git/NF1_cDNA_Pipeline/work/30/fd85bc72f459bed4bf08b7aca5b43b

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

-- Check '.nextflow.log' file for details

Please could you look into this for me. Thank you

pinin4fjords commented 5 days ago

The errors here are pretty clear:

!!!!! WARNING: while processing sjdbGTFfile=NF1.gtf, line:
17 hg19_refGene exon 29701031 29704695 0.000000 + . gene_id "NM_000267"; transcript_id "NM_000267";
exon end = 29704695 is larger than the chromosome 17 length = 8520 , will skip this exon

Have you, for example, checked chromosome 17 in your FASTA and found that it's longer than 8520?