pralab / secml_malware

Create adversarial attacks against machine learning Windows malware detectors
https://secml-malware.readthedocs.io/
GNU General Public License v3.0
203 stars 46 forks source link

Train models #46

Closed snehith57624 closed 1 year ago

snehith57624 commented 1 year ago

I want to fine tune the model using new data, I see that we need to pass boolean value to train model but however there are no supporting functions available for such operations

zangobot commented 1 year ago

Good morning! Training and fine-tuning of models is not yet implemented (the library is solely for evasion attacks). However, if you would like to add this functionality, you can consider opening a pull request with that!

snehith57624 commented 1 year ago

Getting error in training model when using embed function

    print("ok inside train 1")
    try:
        print("global_state.data_paths ", global_state.data_paths)
        for file_path in path:
            with open(file_path, 'rb') as handle:
                bytecode = handle.read()
            print("ok inside train 2")
            print("global_state.target ", global_state.target)
            net: CClassifierEnd2EndMalware = global_state.target
            x = End2EndModel.bytes_to_numpy(bytecode, net.get_input_max_length(), net.get_embedding_value(),
                                            net.get_is_shifting_values())

            model = MalConv()
            model.train(True)
            criterion = nn.BCELoss()
            optimizer = Adam(model.parameters(), lr=0.01)
            scheduler = ReduceLROnPlateau(optimizer, patience=3, verbose=True, factor=0.5,
                                          threshold=0.001, min_lr=0.00001, mode='max')
            print("ok inside train 3")
            epochs = 3
            y_pred = 0
            for epoch in range(epochs):
                y_pred = model.embedd_and_forward(model.embed(x))
                loss = criterion(y_pred, label)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
            model.eval()
            print("ok inside train 8")
            scheduler.step(f1_score(label, y_pred))
            print("ok inside train 4")
    except Exception as e:
        print(e)
    return model

Getting error : "Dimension out of range (expected to be in range of [-1, 0], but got 1)" Can you help to resolve this

Thanks

zangobot commented 1 year ago

Hello,

Probably y_train is just a number and not a batch as expected by the loss. Check which shapes the loss needs first, let me know!

snehith57624 commented 1 year ago

Thanks got that cleared now my malconv model is predicting as trained but when I see the output it doesn't have prediction properly printed.

def _perform_optimization(attack, file_path, stats, x, y): print('-' * 10) info_prompt(f'Processing {file_path}...') y_pred, adv_score, adv_ds, f_obj = attack.run(x, y) y_pred = y_pred.item()

score = adv_score[0, 1].item()
stats['evasion'] += (1 - y_pred)
stats['total'] += 1
stats['adv_score'] += score
net = create_wrapper_for_global_target()
_, original_score = net.predict(x, return_decision_function=True)
stats['before_score'] += original_score[0, 1]
info_prompt(f'Results for {file_path}')
info_prompt(f'Final label: {y_pred}')
info_prompt(f'Initial score: {original_score}')
info_prompt(f'Final score: {score}')
return adv_ds

In the above code in line : _, original_score = net.predict(x, return_decision_function=True)
the first argument is the prediction label but it is not we are printing may I know the reason?
zangobot commented 1 year ago

I don't think I got the question: if you set the "return_decision_function" to True, the output is the prediction and the score. The prediction is "score > threshold", hence 1 if malware, 0 if goodware. Score is a CArray with two entries: first is goodware score, the second is malware (and they sum to 1)