gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
369 stars 77 forks source link

Issue: PrepDE.py Script "IndexError: list index out of range" #387

Open mlvanhorn opened 1 year ago

mlvanhorn commented 1 year ago

Hello,

I am using Stringtie in conjunction with CIRIquant and am using the prepDE.py script to generate the gene count matrix for normalization. This has worked with no issue for me in the past, but when running this most recent analysis, I encountered the following error:

5858 3565
5859 3565
5860 3565
Traceback (most recent call last):
  File "prepDE.py", line 226, in <module>
    if v[2]=="transcript":
IndexError: list index out of range

I checked my input .lst file for any extra spaces or typos and didn't see anything. Here is the location in the input .lst file where I believe prepDE.py is encountering this problem. There are other groups of the 3565 samples previously, but it looks like the error is coming from the thirteenth file associated with 3565, the top line in the code shown below.

3565 /ocean/projects/mcb200049p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase2/IR1.3565/V06_L001/gene/3565_V06_L001_out.gtf
3565 /ocean/projects/mcb200049p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase2/IR1.3565/V06_L002/gene/3565_V06_L002_out.gtf
3565 /ocean/projects/mcb200049p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase2/IR1.3565/V06_L003/gene/3565_V06_L003_out.gtf
3565 /ocean/projects/mcb200049p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase2/IR1.3565/V06_L004/gene/3565_V06_L004_out.gtf
3565 /ocean/projects/mcb200049p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase2/IR1.3565/V08_L001/gene/3565_V08_L001_out.gtf
3565 /ocean/projects/mcb200049p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase2/IR1.3565/V08_L002/gene/3565_V08_L002_out.gtf
3565 /ocean/projects/mcb200049p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase2/IR1.3565/V08_L002t/gene/3565_V08_L002t_out.gtf

As an example, here is a previous group of 3565 files that were processed by prepDE.py successfully.

3565 /ocean/projects/bio220052p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase1/IR1.3565/BL_L001/gene/3565_BL_L001_out.gtf
3565 /ocean/projects/bio220052p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase1/IR1.3565/BL_L003/gene/3565_BL_L003_out.gtf
3565 /ocean/projects/bio220052p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase1/IR1.3565/BL_L004/gene/3565_BL_L004_out.gtf
3565 /ocean/projects/bio220052p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase1/IR1.3565/V04_L001/gene/3565_V04_L001_out.gtf
3565 /ocean/projects/bio220052p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase1/IR1.3565/V04_L002/gene/3565_V04_L002_out.gtf
3565 /ocean/projects/bio220052p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase1/IR1.3565/V04_L003/gene/3565_V04_L003_out.gtf
3565 /ocean/projects/bio220052p/mvanhorn/PPMI_RNAbloodseq/PPMI-Phase1/IR1.3565/V04_L004/gene/3565_V04_L004_out.gtf

I have double-checked the pathway to the .gtf file and that is correct as well.

Any advice or suggestions on what might be causing this error would be greatly appreciated. Thank you for your time.

Best,

Megan

Yinuo113 commented 1 year ago

I also met the above problem before. I found that some sample quantification results were incomplete. You can remove the unusually small results and re-quantify.