lipi12q / TranscriptionNet

TranscriptionNet is an attention-based deep learning algorithm that integrates various large-scale gene function network information to predict changes in induced gene expression (GECs) by perturbing each gene in the genome.
0 stars 0 forks source link

The datasets_split function lack the cmap_scaled in the data_process.py #1

Open myseverus opened 5 days ago

myseverus commented 5 days ago

def datasets_split(GECs, feature, save_path): """ Datasets split and data scaler. """ GECs_filter = GECs[GECs.index.isin(feature.index)] GECs_train, GECs_test, GECs_valid = train_test_val_split(cmap_scaled, 0.7, 0.2, 0.1)

scaler = MinMaxScaler(feature_range=(-1, 1))
GECs_train_scaled = scaler.fit_transform(GECs_train.values)
GECs_test_scaled = scaler.fit_transform(GECs_test.values)
GECs_valid_scaled = scaler.fit_transform(GECs_valid.values)

GECs_train = pd.DataFrame(GECs_train_scaled, index=GECs_train.index, columns=GECs_filter.columns).sort_index()
GECs_test = pd.DataFrame(GECs_test_scaled, index=GECs_test.index, columns=GECs_filter.columns).sort_index()
GECs_valid = pd.DataFrame(GECs_valid_scaled, index=GECs_valid.index, columns=GECs_filter.columns).sort_index()

feature_train = feature[feature.index.isin(GECs_train.index)].sort_index()
feature_test = feature[feature.index.isin(GECs_test.index)].sort_index()
feature_valid = feature[feature.index.isin(GECs_valid.index)].sort_index()

GECs_dict = {'train': GECs_train, 'valid': GECs_valid, 'test': GECs_test}
feature_dict = {'train': feature_train, 'valid': feature_valid, 'test': feature_test}

save_datasets(GECs_dict, save_path + 'GECs_dict.pkl')
save_datasets(feature_dict, save_path + 'feature_dict.pkl')

return scaler

hi author ,the variable cmap_scaled not deine ,how to solve the problem.

lipi12q commented 5 days ago

Thank you very much for your reminder, the code has been modified, and you are welcome to give us more and better comments.