Closed xiaohan2012 closed 3 years ago
Hi @xiaohan2012, it's not a bug. The files are loaded correctly. There is basically no feature 101937
in the wiki10-31k
test set, while it is in the train set. This is also a case for some other datasets from the XMLC repo. The method that loads the data reads all libsvm file formats and standard libsvm file format does not include information about a number of features/columns, so when casting to scipy.csr_matrix
value of shape[1]
is simply deduced from loaded data. I agree that it would be nicer if the numbers of columns match, and it could be improved, but since the data are sparse, it's not really a problem to resize it.
Thanks for the reply.
As for resizing it, I assume I add a zero column somewhere in the smaller matrix (tst_X
in the above example).
Where should I add it, before the 1st column or after the last one?
Ah, sorry, I got it. I append a zero column after the last column :)
Hi,
There seems to be a bug in the data loading process.
For example:
gives:
Cheers, Han