twopin / CAMP

predicting peptide-protein interactions
117 stars 30 forks source link

error in preprocessing features #30

Closed mariemghoula closed 1 year ago

mariemghoula commented 1 year ago

Hello,

I am trying to generate the features for a set of peptides and proteins using the preprocess_features.py. The problem is I cannot manage to create those two files: protein_ss_feature_dict and peptide_ss_feature_dict I also tried with the example_data.tsv file and I still have the same issue. This is the error I get whenever I run the script: Traceback (most recent call last): File "preprocess_features.py", line 179, in <module> feature = label_seq_ss(pep_ss, pad_pep_len, seq_ss_set) File "preprocess_features.py", line 66, in label_seq_ss X[i] = res_ind[res] TypeError: 'set' object is not subscriptable

I do not know if there is an error in this step: def label_seq_ss(line, pad_prot_len, res_ind): line = line.strip().split(',') X = np.zeros(pad_prot_len) for i ,res in enumerate(line[:pad_prot_len]): X[i] = res_ind[res] return X

or if the seq_set_set is not defined properly like the other sets?

Can you please help resolve this problem?

Thank you,

Mariem

AnthonyYao7 commented 1 year ago

The issue lies in the naming of the variables in the program. Since seq_ss_set is defined twice, the second definition is passed to the function. I fixed the problem by changing X[i] = res_ind[res] in the function label_seq_ss to X[i] = seq_ss_set[res]. I also had to move everything inside the if name == "main" call into a new function called main and calling main inside the if statement. Hope this helps

mariemghoula commented 1 year ago

Thank you @AnthonyYao7 ! I'll be trying this out and send an update if anyone has the same issue.

twopin commented 1 year ago

Sorry this bug is due to my careless when changing the variable name for this script. Actually "seq_ss_set" should be used twice (both for peptide and protein). For the "seq_ss_set" , as @AnthonyYao7 pointed out, seq_ss_set was named twice and I've rename the one for the label_seq_ss into seq_ss_dict. Hope this can help.