schatzlab / scikit-ribo

Accurate estimation and robust modelling of translation dynamics at codon resolution
GNU General Public License v2.0
18 stars 8 forks source link

Inconsistence Gene IDs used in gtf_preprocess.py #6

Open catsargent opened 6 years ago

catsargent commented 6 years ago

Whilst using gtf_preprocess.py to create the expandCDS.fasta file, I obtained the following error:

Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/scikit-ribo/gtf_preprocess.py", line 280, in worker.getSeq() File "/usr/local/lib/python3.5/dist-packages/scikit-ribo/gtf_preprocess.py", line 154, in getSeq self.fiveUtrDic[geneName] + self.fastaDic[geneName] + self.threeUtrDic[geneName] + "\n") KeyError: 'ENSG00000230989'

This appears to be because in the 3utr.fasta, 5tr.fasta and cds.fasta files that were created have, for example, the following as a header:

ENSG00000187961::1:960586-965715(+)

Whereas the variable self.geneNames stores the IDs as only e.g. ENSG00000187961

Given the previous issue that I raised and solved, please can you confirm whether there is a problem in the code that is giving rise to this error?

Many thanks, Catherine