yafeng / proteogenomics_python

python scripts for proteogenomics analysis
10 stars 10 forks source link

NameError, KeyError - #2

Closed NabilaRahman closed 4 years ago

NabilaRahman commented 6 years ago

Hi, I'm on Linux Mint 18.3 Sylvia (KDE). I tried python 2.7, 3.5, 3.6 all give the same error.

Traceback (most recent call last): File "/media/Linux/Packages/proteogenomics/map_peptide2genome.py", line 125, in feature_dic=parse_gtf(gtf_file) NameError: name 'gtf_file' is not defined

If I directly assign all the arguments (gtf_file, fasta_file etc) directly as a variable within the .py file it works. But then I get this error

Traceback (most recent call last): File "map_peptide2genome.py", line 169, in enst=id_dic[ensp] KeyError: 'F8VQ05'

yafeng commented 6 years ago

Can you paste your command here

NabilaRahman commented 6 years ago

Thanks for the quick reply, Here's what i have: python /Downloads/map_peptide2genome.py --input ./mouse.gtf --fasta ./GRCm38.pep.rev92.fa --IDmap ./IDmap_file --output ./knownprots

Tried this is windows as well. same issue. even when i don't give any variables, i get the correct warning message but also the NameError python F:\map_peptide2genome.py

Warning! wrong command, please read the mannual in Readme.txt. Example: python map_peptide2genome.py --input input_filename --gtf Homo_sapiens.GRCh37.75.gtf --fasta Homo_sapiens.GRCh37.75.pep.all.fa --IDmap Ensembl75_IDlist.txt --output output_filename reading GTF input file Traceback (most recent call last): File "F:\map_peptide2genome.py", line 125, in feature_dic=parse_gtf(gtf_file) NameError: name 'gtf_file' is not defined

yafeng commented 6 years ago

--input should be the peptide identification table with peptide sequence in first column, protein ID in second column. ./mouse.gtf should be provided after --gtf an example command can be python /Downloads/map_peptide2genome.py --input pepTable.txt --gtf ./mouse.gtf --fasta ./GRCm38.pep.rev92.fa --IDmap ./IDmap_file --output peptides.gff3

NabilaRahman commented 6 years ago

Sorry this is the actual command. I didn't paste it correctly. Still getting the error

python /Downloads/map_peptide2genome.py --input ./significant_peptides.txt --gtf ./mouse.gtf --fasta ./GRCm38.pep.rev92.fa --IDmap ./IDmap_file --output ./knownprots

the gtf file is downloaded from UCSC ftp site as well as the fasta file for. significant_peptides.txt has the peptide sequence in first column and protein ID in second. IDmap file follows the format that's shown in the example (downloaded from biomart)

yafeng commented 6 years ago

OK, I see. This protein id F8VQ05 in the input file 'significant_peptides.txt' is probably not in the same format as seen in the fasta file GRCm38.pep.rev92.fa. Can you do these two commands for me and paste the outputs here?
grep F8VQ05 significant_peptides.txt | head -1 grep F8VQ05 GRCm38.pep.rev92.fa

NabilaRahman commented 6 years ago

I see the problem now NameError - because there was space in one of the filepath (i didn't write it out in full here) KeyError - you are right the ID in the fasta file is is ENMUSP* (ensembl) format, while my protein ID is uniprot ID.

Thanks very much for your help.

yafeng commented 6 years ago

YES! the id in the peptide table and fasta file should match :)