Open sebhrusen opened 3 years ago
Just checking - are these sparse target matrices y
? We might indeed not have tests for that.
CC @eddiebergman
@mfeurer in this case both X
and y
are indeed sparse, not sure this makes sense for y
.
I currently fixed this by turning both into arrays as I thought the problem was X
, but it's very possible that for some frameworks, it's only necessary to do this for y
.
Thanks for the clarification. Auto-sklearn should support sparse X
, but we'll check, and will also check what the behavior for sparse y
values is.
@mfeurer for autosklearn, sparse X
with dense y
seems to work fine (and faster), meaning that in your case, sparse y
was the issue.
Thanks for noticing this: ideally we'd like to have frameworks using sparse data whenever possible, so I'll probably just make the y
s dense by default, and see individually for each framework regarding X
.
cc: @PGijsbers
@sebhrusen It's probably in the interest of autosklearn
to handle sparse y
correctly in this case, I'll have a look into it
@eddiebergman Sure, just mentioning that we have a workaround on our side for now that also seems to work for other frameworks. Thanks for fixing it on your side too.
Hi @sebhrusen,
Just letting you know the fix should be in the next release and I tracked down the problem a little more and wrote a brief synopsis, incase it helps identify the problem for other libraries.
Failing datasets: https://openml.org/t/360932 https://openml.org/t/360932
We'll improve support for sparse data in a future version: for now, we can simply deserialize the sparse matrices as dense matrices for the frameworks that don't use pandas.