Closed maria-yampolskaya closed 3 years ago
Working on this now.
Woah, how did I not learn about .index
until now? What a nice function for lists ... >.>
Resolved. You needed to pass data
and labels
as numpy arrays. I've changed the code so this is no longer necessary; things will be typecasted into numpy arrays where necessary.
i.e. this would have solved the problem:
dp.Full_Dataset(used_images2/255., np.array(onehot_labels), serials=identifiers, val_size=0.2, do_scaling=False)
I tended to avoid ever making things numpy arrays inside functions, because by default np.array(x)
copies the data even if x
was a numpy array already. But I looked up the documentation and it turns out there's a flag for copy=False
which behaves how you want it to.. e.g.:
x = np.array([1,2,3])
np.array(x, copy=False) is x
>> True
y = [1,2,3]
np.array(y, copy=False) is y
>> False
So now I typecast the arrays to numpy arrays inside the class initialization. This is actually going to be really helpful in a lot of places for me, this (implicitly hoping the user only uses numpy arrays, because I don't want to waste memory & time by copying the data) is something that has bothered me for a long time - now I know the fix :)
I created a small function to convert from type names to a one-hot vector (for multi-label encoding, so we can use categorical cross-entropy and scikit multilearn):
Then I created a list of these labels:
But when I try to create a dataset:
I get the following error:
It seems that the algorithm to split the data into training, validation, and test is hard-coded to only accept integer labels. Help me add one-hot vector compatibility pls