Closed lagitannerie closed 2 years ago
Hi @lagitannerie ! Thank you for reporting this. I allowed myself to split your issue in two as the errors you got pertain to different things.
Problem: This comes from a bug in the dataset generation I did not fix yet. When packing fails for a file, its copy in the files
folder of dataset's folder structure remains while it should have removed. Using model train
relies on dataset's data.csv
and the content of the files
folder for computing features, therefore finding a different count (as data.csv
does not get updated if packing fails while the file is indeed present in files
). In your example, you got 9 errors with the UPX packers while 400 samples where retained, hence having 409 executables copied in dataset's ´files´ folder.
Workaround: Use dataset fix pe-upx-dataset
to fix your dataset.
I will try to troubleshoot this issue very soon.