Closed AlexVanMechelen closed 7 months ago
If I get it right, you use a single feature to train your model ?
On Mon, 22 Apr 2024, 20:52 Alex Van Mechelen, @.***> wrote:
Issue
I kept getting unrealistic model performances of 100% for each metric in any experiment, so I pulled it to the extreme as a POC: Demo experiment
Using just 1 randomly selected feature byte_17_after_ep of which I believe it has little predictive power for datasets with a high variation of packer families, a RF model was trained on a dataset with many different compressor families (very low probability that the 17th byte after the EP has a common trend for all of them, never occurring in any of the not-packed samples).
for P in ASPack BeRoEXEPacker MEW MPRESS NSPack Packman PECompact UPX; do dataset update tmp -n 50 -s dataset-packed-pe/packed/$P -l dataset-packed-pe/labels/labels-compressor.json; done dataset update tmp -s dataset-packed-pe/not-packed -n 400 dataset select -n 200 -s tmp tmp2
Listing the datasets:
dataset list
Datasets (10)
Name #Executables Size Files Formats Packers ─────────────────────────────────────────────────────────────────────────── tmp 600 164MB yes PE compressor{307} tmp2 200 32MB yes PE compressor{93}
Training the model gives perfect metrics:
model train tmp -A rf <
> Classification metrics . Accuracy Precision Recall F-Measure MCC AUC
────────────────────────────────────────────────────────────────── Train 100.00% 100.00% 100.00% 100.00% 0.00% - Test 100.00% 100.00% 100.00% 100.00% 0.00% -
Testing the model with a dataset with no overlap also gives perfect metrics:
model test tmp_pe_600_rf_f1 tmp2 <
> Classification metrics Accuracy Precision Recall F-Measure MCC AUC Processing Time ──────────────────────────────────────────────────────────────────────────── 100.00% 100.00% 100.00% 100.00% 0.00% - 10.816ms
Question
Am I maybe doing something wrong?
— Reply to this email directly, view it on GitHub https://github.com/packing-box/docker-packing-box/issues/110, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFPVBWLYZ4ZJOOKGF2ITNTY6VL7ZAVCNFSM6AAAAABGTKQO4KVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2TOMRUGY2TCMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@dhondta For the above demo yes, to emphasise that 100% on all metric is unrealistic in that scenario. Besides the above experiment I've tried many other configurations always resulting in perfect metrics
The binary classifier looks for samples labeled as "not-packed" and labels them "False", while any other label gets put as "true". Non-labeled samples are rejected and don't make it to the model training. Therefore only one class arrives in the model training, yielding perfect metrics.
It would be useful to be able to specify with for example a flag "-L" in the dataset convert command to assign the "not-packed" label to those features. This would allow to perform experiments where class 1 = "cryptors" and class2 comprises of non-cryptors (including samples packed with packers not belonging to the cryptor category, but also not-packed samples), in this case all labeled as "not-packed" for correct interpretation by the tool
@AlexVanMechelen see commit 8112fc59 ; you can now use -T
with model train
to solve this issue. Please test and report.
Tested & functional. Encountered one issue, fixed in #114
Issue
I kept getting unrealistic model performances of 100% for each metric in any experiment, so I pulled it to the extreme as a POC:
Demo experiment
Using just 1 randomly selected feature
byte_17_after_ep
of which I believe it has little predictive power for datasets with a high variation of packer families, a RF model was trained on a dataset with many different compressor families (very low probability that the 17th byte after the EP has a common trend for all of them, never occurring in any of the not-packed samples).Listing the datasets:
Training the model gives perfect metrics:
Testing the model with a dataset with no overlap also gives perfect metrics:
Question
Am I maybe doing something wrong?