Closed snehith57624 closed 1 year ago
Good morning! Training and fine-tuning of models is not yet implemented (the library is solely for evasion attacks). However, if you would like to add this functionality, you can consider opening a pull request with that!
Getting error in training model when using embed function
print("ok inside train 1")
try:
print("global_state.data_paths ", global_state.data_paths)
for file_path in path:
with open(file_path, 'rb') as handle:
bytecode = handle.read()
print("ok inside train 2")
print("global_state.target ", global_state.target)
net: CClassifierEnd2EndMalware = global_state.target
x = End2EndModel.bytes_to_numpy(bytecode, net.get_input_max_length(), net.get_embedding_value(),
net.get_is_shifting_values())
model = MalConv()
model.train(True)
criterion = nn.BCELoss()
optimizer = Adam(model.parameters(), lr=0.01)
scheduler = ReduceLROnPlateau(optimizer, patience=3, verbose=True, factor=0.5,
threshold=0.001, min_lr=0.00001, mode='max')
print("ok inside train 3")
epochs = 3
y_pred = 0
for epoch in range(epochs):
y_pred = model.embedd_and_forward(model.embed(x))
loss = criterion(y_pred, label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
model.eval()
print("ok inside train 8")
scheduler.step(f1_score(label, y_pred))
print("ok inside train 4")
except Exception as e:
print(e)
return model
Getting error : "Dimension out of range (expected to be in range of [-1, 0], but got 1)" Can you help to resolve this
Thanks
Hello,
Probably y_train is just a number and not a batch as expected by the loss. Check which shapes the loss needs first, let me know!
Thanks got that cleared now my malconv model is predicting as trained but when I see the output it doesn't have prediction properly printed.
def _perform_optimization(attack, file_path, stats, x, y): print('-' * 10) info_prompt(f'Processing {file_path}...') y_pred, adv_score, adv_ds, f_obj = attack.run(x, y) y_pred = y_pred.item()
score = adv_score[0, 1].item()
stats['evasion'] += (1 - y_pred)
stats['total'] += 1
stats['adv_score'] += score
net = create_wrapper_for_global_target()
_, original_score = net.predict(x, return_decision_function=True)
stats['before_score'] += original_score[0, 1]
info_prompt(f'Results for {file_path}')
info_prompt(f'Final label: {y_pred}')
info_prompt(f'Initial score: {original_score}')
info_prompt(f'Final score: {score}')
return adv_ds
In the above code in line : _, original_score = net.predict(x, return_decision_function=True)
the first argument is the prediction label but it is not we are printing may I know the reason?
I don't think I got the question: if you set the "return_decision_function" to True, the output is the prediction and the score. The prediction is "score > threshold", hence 1 if malware, 0 if goodware. Score is a CArray with two entries: first is goodware score, the second is malware (and they sum to 1)
I want to fine tune the model using new data, I see that we need to pass boolean value to train model but however there are no supporting functions available for such operations