idmjky / EvolvePro

PLM based active learning model for protein engineering
Other
40 stars 3 forks source link

Issue with Multi-Mutant Sequence Handling #5

Open aaaaaaaaa21-code opened 4 weeks ago

aaaaaaaaa21-code commented 4 weeks ago

Hi, I've been exploring your published model and find it very interesting. I attempted to run it with my data but encountered an error.

In my case, I'm working with sequences that contain multiple amino acid mutations. I changed the single_mutant flag in the read_experimental_data function within top_layer.py to false. This modification led to the following error:

Embeddings and labels are aligned Traceback (most recent call last): File "/media/dell/newdisk/EvolvePro-main/top_layer.py", line 355, in df_test, df_all = top_layer( File "/media/dell/newdisk/EvolvePro-main/top_layer.py", line 205, in top_layer y_pred_test = model.predict(X_test) File "/home/dell/anaconda3/lib/python3.9/site-packages/sklearn/ensemble/_forest.py", line 1064, in predict X = self._validate_X_predict(X) File "/home/dell/anaconda3/lib/python3.9/site-packages/sklearn/ensemble/_forest.py", line 641, in _validate_X_predict X = self._validate_data( File "/home/dell/anaconda3/lib/python3.9/site-packages/sklearn/base.py", line 633, in _validate_data out = check_array(X, input_name="X", **check_params) File "/home/dell/anaconda3/lib/python3.9/site-packages/sklearn/utils/validation.py", line 1026, in check_array raise ValueError( ValueError: Found array with 0 sample(s) (shape=(0, 1280)) while a minimum of 1 is required by RandomForestRegressor.

When I do not modify single_mutant to false, all the output results are 1. Could you help me resolve this issue?

Thank you for your assistance.

idmjky commented 4 weeks ago

Hi, Can you attach your input result csv file, so I can check the format. Also, is that aligned with the name column in your embedding files? if you can attach these two, then I can help debug this.

aaaaaaaaa21-code commented 4 weeks ago

Hi, Can you attach your input result csv file, so I can check the format. Also, is that aligned with the name column in your embedding files? if you can attach these two, then I can help debug this.

Thank you for your response and reminder. I realized the error was due to not inputting the sequence needed for prediction, resulting in the absence of X_test. I have now resolved this issue and successfully run the model!