Closed jramhani closed 2 months ago
@jramhani The issue comes from a design choice ; samples get named as of their SHA256, either if they are cleanware or packed, meaning that when you create a dataset of N cleanware, that you mass-pack the same samples to merge them with the cleanware, instead of getting 2*N samples, you will get your original dataset as packed samples won't update cleanware ones (as of the current behavior, samples' metadata won't get updated).
The normal way of working is to use separate datasets.
Workaround: You can eventually ingest samples from the packed dataset with dataset ingest ...
so that samples get imported and renamed according to the SHA256 of their packed version but you will need to provide the labels in a JSON file.
Merging issue
I tried to merge a baseline dataset with its altered version in a new mixed dataset. The result of the command is a dataset having only samples from one and not the other.
I suspect the filename HASH that is not updated after alteration, thus the merge command sees conflicting names