I wonder 'else' clause (at l.172-173 of version v2.2.1 Latest on Jan 26, 2022) should be commented out (or deleted) in the block of badGenes check in prepDE.py3.
for s in samples:
badGenes=[] #list of bad genes (just ones that aren't MSTRG)
try:
...
except StopIteration:
warnings.warn("Didn't get a GTF in that directory. Looking in another...")
else: #we found the "bad" genes!
break
The present version with the else clause results in setting geneIDs[t_id]=g_id in the for loop only for the first sample but not the rest of the samples.
When there are expression differences among alternative splicing variants among samples, the number of the gene of some samples can be fewer in gene_count_matrix.csv than the sum of splicing variants seen in transcript_count_matrix.csv.
Then at l.279,
geneDict[geneIDs[i]][s[0]]+=v[s[0]]
when the geneIDs does not have corresponding record of g_id for a t_id, <class 'str'> will be set (and appear at the last line of gene_count_matrix.csv.
Example of my observation for a gene is as follows.
With the present version:
$ grep '\bSR34\b' gene_count_matrix.csv
MSTRG.110|SR34,99,29,0,50,0,0,24,96,24 # <- the value of the last 2 columns, i.e. 96 and 24, is not the sum of the count of 3 transcripts.
$ grep AT1G02840 transcript_count_matrix.csv
AT1G02840.1,44,29,0,50,0,0,24,96,24
AT1G02840.2,55,0,0,0,0,0,0,0,0
AT1G02840.3,,0,0,0,0,0,0,18,32
$
I wonder 'else' clause (at l.172-173 of version v2.2.1 Latest on Jan 26, 2022) should be commented out (or deleted) in the block of badGenes check in
prepDE.py3
.The present version with the
else
clause results in settinggeneIDs[t_id]=g_id
in thefor
loop only for the first sample but not the rest of the samples. When there are expression differences among alternative splicing variants among samples, the number of the gene of some samples can be fewer ingene_count_matrix.csv
than the sum of splicing variants seen intranscript_count_matrix.csv
. Then at l.279,when the
geneIDs
does not have corresponding record ofg_id
for at_id
,<class 'str'>
will be set (and appear at the last line ofgene_count_matrix.csv
.Example of my observation for a gene is as follows.
With the present version:
After comment out of the
else
block: