gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
361 stars 76 forks source link

Stuck on prepDE.py #427

Closed Kappy2 closed 2 months ago

Kappy2 commented 2 months ago

I ran STAR for alignment, then string tie for counts. I ran bam files against the genome, then merged the gtf, then reran the counts. I am trying to output the data as a .csv file but ran into an error. Here are the input commands that I used for the entire sequence (please bear with me, no matter how many times I run this, I am still a noob).

STEP 1: Generate INDEX FILE (FA and GTF files needed) STAR --runMode genomeGenerate --genomeSAindexNbases 9 --genomeDir /gpfs/scratch/kipoon/00_fastq --genomeFastaFiles /gpfs/scratch/kipoon/00_fastq/GCA_015227675.2_mRatBN7.2_genomic.fna --sjdbGTFfile /gpfs/scratch/kipoon/00_fastq/GCA_015227675.2_mRatBN7.2_genomic.gtf --sjdbOverhang 99

STEP 2: ALIGNMENT STAR --genomeDir /gpfs/scratch/kipoon/00_fastq --readFilesIn /gpfs/scratch/kipoon/00_fastq/FILENAME_R1 /gpfs/scratch/kipoon/00_fastq /FILENAME_R2 --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outSAMunmapped Within --outSAMattributes Standard --outSAMstrandField intronMotif --outSAMattrIHstart 0 --outFilterIntronMotifs RemoveNoncanonical

STEP 3: QUANTIFICATION stringtie /gpfs/scratch/kipoon/00_fastq/FILE.bam -o stringtie.gtf -A gene_abund.tab -G /gpfs/scratch/kipoon/GCA_015227675.2_mRatBN7.2_genomic.gff -–rf

merging files: stringtie --rf --merge gtfmerge.txt -o ALLmerge.gtf -G /gpfs/scratch/kipoon/00_fastq/GCA_015227675.2_mRatBN7.2_genomic.gff

Rerunning: stringtie -e -B /gpfs/scratch/kipoon/00_fastq/out/2.bam -o 2.gtf -G /gpfs/scratch/kipoon/00_fastq/quant/ALLmerge.gtf -A 2abund.tab

Now here is the problem (I am running the latest string tie version): [kipoon@login2 quant_2]$ prepDE.py -v -i /gpfs/scratch/kipoon/00_fastq/quant_2/allgtfsecondset.txt

processing sample 100CXCL12 from file /gpfs/scratch/kipoon/00_fastq/quant_2/13.gtf processing sample 100CXCL12 from file /gpfs/scratch/kipoon/00_fastq/quant_2/14.gtf Error: could not locate transcript MSTRG.8388.1 entry for sample 100CXCL12 Traceback (most recent call last): File "/gpfs/software/stringtie/2.1.4/bin/prepDE.py", line 281, in geneDict.setdefault(geneIDs[i],{}) #gene_id KeyError: 'MSTRG.8388.1'

Any insight or help? Did not find any solutions via usual googling. Thank yoU!

Kappy2 commented 2 months ago

I figured it out. Used the -v to find out where it got stuck and redid the analysis on those files.

chinaji2008 commented 1 month ago

I figured it out. Used the -v to find out where it got stuck and redid the analysis on those files.

Hi, I've encountered the same issue. How do you identify errors in the analysis of files? I've checked all the gtf files included in the analysis and found that some transcripts do not exist in these gtf. I'm not sure how to proceed. Do you have any good suggestions?