y-value equals to user's index instead of user's label

luizgh / sigver

Signature verification package, for learning representations from signature data, training user-dependent classifiers.

BSD 3-Clause "New" or "Revised" License

82 stars 46 forks source link

y-value equals to user's index instead of user's label #11

Closed HeitorBoschirolli closed 5 years ago

HeitorBoschirolli commented 5 years ago

In the file "sigver/datasets/util.py" the function "process_dataset_images" loops through all the users and set the y-value of each signature to the index of the user instead of the value.

In other words, it does this:

for i, user in enumerate(tqdm(users)):
    ...
    y[indexes] = i

but I believe it should be like this:

for i, user in enumerate(tqdm(users)):
    ...
    y[indexes] = user

Is this an error or did I understood something incorrectly?

luizgh commented 5 years ago

Hi @HeitorBoschirolli - this is by design. For learning, we want the classes to be numbers from 0 to N without any gaps (while the user ids may, and do, have gaps in most datasets). We keep track of which class refers to which user with the "user_mapping[i]" dictionary (see line 166: user_mapping[i] = user)

HeitorBoschirolli commented 5 years ago

Hello @luizgh, thanks for replying so quickly. If the y-values will always be numbers from 0 to N, won't the function get_subset from datasets/util.py return the wrong values?

If the dataset signature labels are from 1 to N+1, for example, the code above will ignore the first author, won't it?

to_include = np.isin(data[y_idx], subset)
return tuple(d[to_include] for d in data)

luizgh commented 5 years ago

Hi @HeitorBoschirolli. get_subset should be used with a 0-indexed subset. For instance, on the examples on the readme (https://github.com/luizgh/sigver) you can see: --exp-users 0 300 --dev-users 300 881, so using the first 300 users for exploitation, and users 300-881 as development (all 0-based, without any gaps)

HeitorBoschirolli commented 5 years ago

I used the parameter --exp-users incorrectly and placed the labels (1 to n) instead of the indexes (0 to n-1). Thanks for the help