Discrepancy in counts extraction using prepDE.py3 with multiple samples in sample_list

Hello,

I hope this message finds you well.

I have encountered a discrepancy when using prepDE.py3 to extract counts data from the .gtf output of StringTie. Specifically, I noticed that the counts results extracted from the .gtf file differ when multiple samples are listed in the sample_list compared to when only one sample is listed. Furthermore, when multiple samples are present, the results often exhibit a higher frequency of zero values.

I would greatly appreciate any insights or suggestions you might have regarding this issue. Thank you for your time and attention to this matter.

The sample_list.txt was:

\multi \single

The command line call was:

\multi python ~/pipeline/rna/stringtie/prepDE.py3 -i sample_list.txt -g gene_count_matrix.csv \single python ~/pipeline/rna/stringtie/prepDE.py3 -i sample_list_test.txt -g gene_count_matrix_test.csv

The result was:

I would greatly appreciate any insights or suggestions you might have regarding this issue. Thank you for your time and attention to this matter.

Best regards, RuQing

gpertea / stringtie