Closed stela2502 closed 1 month ago
Is there a reason you can't perform the conversion to a dense array, would your data become too large to fit in memory? Unfortunately because our algorithm is based on training a neural network and libraries like PyTorch don't support sparse linear layers, I'm not sure there's a way around the sparse -> dense conversion step.
Hi Ian, Initially I thought this was the problem, but I could fix that by simply adding more memory. In general I think it is an extremely bad practice to blow up sparse matrices. Would it not be possible to convert line by line while you feed your model? I assume the model itself would not store the 'data' - it should create a model from the data. So in total the memory requirement would be lower. Even if this kind of conversion would take more time - in comparison to the analysis it should be neglectable.
This is fixed by the linked pull request. Briefly, the conversion now only occurs at the time of loading a batch, without requiring the full dataset to be converted to dense when creating an instance of ExpressionDataset
.
I assume that the title might be enough.
In your example 01_persist_supervised.ipynb you have this statement:
And of cause if you do not do that it does not work.
Can't this tool be supporting sparse data instead? This does not feel state of the art - sorry.