kexinhuang12345 / DeepPurpose

A Deep Learning Toolkit for DTI, Drug Property, PPI, DDI, Protein Function Prediction (Bioinformatics)
https://doi.org/10.1093/bioinformatics/btaa1005
BSD 3-Clause "New" or "Revised" License
974 stars 272 forks source link

error in GetSequenceOrderCouplingNumber #31

Closed jchartove closed 4 years ago

jchartove commented 4 years ago

I'm calling GetQuasiSequenceOrder for every protein in the BindingDB list and running into this error. I get that these things may be happening because I'm calling the functions directly rather than using data_process, but I'd still like to be able to call the encoding functions on their own.


KeyError Traceback (most recent call last)

in 8 for func in prot_func_list: 9 save_column_name = func.__name__ ---> 10 AA = pd.Series(df_data[column_name].unique()).apply(func) 11 AA_dict = dict(zip(df_data[column_name].unique(), AA)) 12 df_data[save_column_name] = [AA_dict[i] for i in df_data[column_name]] ~\anaconda3\envs\multiPurpose\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds) 4198 else: 4199 values = self.astype(object)._values -> 4200 mapped = lib.map_infer(values, f, convert=convert_dtype) 4201 4202 if len(mapped) and isinstance(mapped[0], Series): pandas\_libs\lib.pyx in pandas._libs.lib.map_infer() ~\Dropbox\Work\insight\omic\DeepPurpose-omic\DeepPurpose\pybiomed_helper.py in GetQuasiSequenceOrder(ProteinSequence, maxlag, weight) 1908 """ 1909 result = dict() -> 1910 result.update(GetQuasiSequenceOrder1SW(ProteinSequence, maxlag, weight, _Distance1)) 1911 result.update(GetQuasiSequenceOrder2SW(ProteinSequence, maxlag, weight, _Distance1)) 1912 result.update( ~\Dropbox\Work\insight\omic\DeepPurpose-omic\DeepPurpose\pybiomed_helper.py in GetQuasiSequenceOrder1SW(ProteinSequence, maxlag, weight, distancematrix) 1794 for i in range(maxlag): 1795 rightpart = rightpart + GetSequenceOrderCouplingNumber( -> 1796 ProteinSequence, i + 1, distancematrix 1797 ) 1798 AAC = GetAAComposition(ProteinSequence) ~\Dropbox\Work\insight\omic\DeepPurpose-omic\DeepPurpose\pybiomed_helper.py in GetSequenceOrderCouplingNumber(ProteinSequence, d, distancematrix) 1601 temp1 = ProteinSequence[i] 1602 temp2 = ProteinSequence[i + d] -> 1603 tau = tau + math.pow(distancematrix[temp1 + temp2], 2) 1604 return round(tau, 3) 1605 KeyError: 'IX'
kexinhuang12345 commented 4 years ago

Hi yes, this error is expected. Since BindingDB is large and it might contain inputs that have unexpected symbols. So in DeepPurpose, we circumvent that by providing a try and except. Maybe you can also try that for your use case? or you can remove these items from the dataset since they are formatted wrong,

jchartove commented 4 years ago

Oh whoops, I forgot that you added target2quasi! Ok, that works, many thanks