Open shrutiOx opened 5 months ago
Adding a note:
This error is coming up in this case too.
data_folder = './clean_data/clean/' data_file = 'small_synthetic.csv'
pad_seqs = 'none' augment_data = 'none'
input_col = 'seq' target_col = 'positive_score' sequence_type = 'nucleic_acid'
model_folder = './exemplars/test/models/' output_folder = './exemplars/test/outputs/' model_type = 'autokeras' task = 'binary_classification' class_of_interest = 1 # 1 for binary classification typically
cutoff_true = 1 cutoff_pred = 0.5 # use 0.5 as predicted ys cut-off, since they will max out at 1 read_in_format_data_and_pred(task, data_folder, data_file, input_col, target_col, pad_seqs, augment_data, sequence_type, model_type, model_folder, output_folder, class_of_interest = class_of_interest, cutoff_true = cutoff_true, cutoff_pred = cutoff_pred);
IndexError Traceback (most recent call last)
Hello,
I hope you are well. I just wanted to ask regarding the background of this aforementioned error which is generating when am trying to evaluate an independent test dataset for proteins (max padded). Below are the code details.
Thank you very much for your kind help and also for this great package! CODE:
data_folder = './clean_data/clean/' data_file = 'test_preacrs.csv'
input_col= 'sequence' target_col = 'Labels' pad_seqs = 'max' augment_data = 'none' sequence_type = 'protein'
model_folder = './exemplars/test/models/' output_folder = './exemplars/test/outputs/' model_type = 'autokeras' task = 'binary_classification' class_of_interest = 1 # 1 for binary classification typically
cutoff_true = 1 cutoff_pred = 0.5 # use 0.5 as predicted ys cut-off, since they will max out at 1
read_in_format_data_and_pred(task, data_folder, data_file, input_col, target_col, pad_seqs, augment_data, sequence_type, model_type, model_folder, output_folder, class_of_interest = class_of_interest, cutoff_true = cutoff_true, cutoff_pred = cutoff_pred);
ERROR: Warning: Unknown letter(s) " " found in sequence Example of bad letter : LKKTIEKLLNSDLNSNYIAKKTGVEQSTIYRLRTGERQLGKLGLDSAERLYNYQKEIE NMKSVKYISNMSKQEKGYRVYVNVVNEDTDKGFLFPSVPKEVIENDKIDELFNFEH HKPYVQKAKSRYDKNGIGYKIVQLDEGFQKFIELNKEKMKENLDY Padding all sequences to a length of 348 Confirmed: No data augmentation requested Confirmed: Scrambled control generated.
IndexError Traceback (most recent call last)