Closed HeitorBoschirolli closed 5 years ago
Hi @HeitorBoschirolli - this is by design. For learning, we want the classes to be numbers from 0 to N without any gaps (while the user ids may, and do, have gaps in most datasets). We keep track of which class refers to which user with the "user_mapping[i]" dictionary (see line 166: user_mapping[i] = user)
Hello @luizgh, thanks for replying so quickly. If the y-values will always be numbers from 0 to N, won't the function get_subset
from datasets/util.py
return the wrong values?
If the dataset signature labels are from 1 to N+1, for example, the code above will ignore the first author, won't it?
to_include = np.isin(data[y_idx], subset)
return tuple(d[to_include] for d in data)
Hi @HeitorBoschirolli. get_subset
should be used with a 0-indexed subset. For instance, on the examples on the readme (https://github.com/luizgh/sigver) you can see: --exp-users 0 300 --dev-users 300 881
, so using the first 300 users for exploitation, and users 300-881 as development (all 0-based, without any gaps)
I used the parameter --exp-users
incorrectly and placed the labels (1 to n) instead of the indexes (0 to n-1). Thanks for the help
In the file "sigver/datasets/util.py" the function "process_dataset_images" loops through all the users and set the y-value of each signature to the index of the user instead of the value.
In other words, it does this:
but I believe it should be like this:
Is this an error or did I understood something incorrectly?