I have encountered a discrepancy when using prepDE.py3 to extract counts data from the .gtf output of StringTie. Specifically, I noticed that the counts results extracted from the .gtf file differ when multiple samples are listed in the sample_list compared to when only one sample is listed. Furthermore, when multiple samples are present, the results often exhibit a higher frequency of zero values.
I would greatly appreciate any insights or suggestions you might have regarding this issue. Thank you for your time and attention to this matter.
Hello,
I hope this message finds you well.
I have encountered a discrepancy when using prepDE.py3 to extract counts data from the .gtf output of StringTie. Specifically, I noticed that the counts results extracted from the .gtf file differ when multiple samples are listed in the sample_list compared to when only one sample is listed. Furthermore, when multiple samples are present, the results often exhibit a higher frequency of zero values.
I would greatly appreciate any insights or suggestions you might have regarding this issue. Thank you for your time and attention to this matter.
The sample_list.txt was:
\multi \single
The command line call was:
\multi
python ~/pipeline/rna/stringtie/prepDE.py3 -i sample_list.txt -g gene_count_matrix.csv
\singlepython ~/pipeline/rna/stringtie/prepDE.py3 -i sample_list_test.txt -g gene_count_matrix_test.csv
The result was:
I would greatly appreciate any insights or suggestions you might have regarding this issue. Thank you for your time and attention to this matter.
Best regards, RuQing