Open lisitsyn opened 9 years ago
@lisitsyn From what I understand, CDenseFeatures
have to be replaced by CDotFeatures
as neural nets training involves lots of dot products. Tell me if I'm wrong.
I know what Sparse (most zero entries) and Dense (most non-zero entries) matrices are. Are CSparseFeatures
and CDenseFeatures
also derived from the same logic? Then there are SGSparseVectors
in shogun as well.
I didn't get the "magic training" part. Please elaborate, how training will be easy for out-of-core data (external memory, data not stored in RAM) etc. if we use CDotFeatures
?
I saw the COFFIN paper is on Linear SVMs. Is there any particular section that I need to focus on? I'll read it tomorrow.
@sanuj
1) Yes, CDenseFeatures -> CDotFeatures
2) Yes, both dense and sparse features are examples of dot features
3) Dense features have to be in-memory but DotFeatures can be anything. As access is usually sequential it could be file or some resource accessed through network
@lisitsyn I see. Basically CDenseFeatures
are matrices of all the features (as columns) so if we make a CDenseFeature
then all the features are loaded at once in the matrix. I don't know how CDotFeatures
are stored. Seems quite an abstract class to me. Also there are no unit tests for CDotFeatures
from where I can get the idea of how to use them.
If I'm not wrong, the idea is if we use CDotFeatures
, then all the features which are child classes of CDotFeatures
can be supplied to the neural nets. (Right?)
I think following classes would be changed (CDenseFeatures -> CDotFeatures
):
CNeuralNetwork
CDeepBeliefNetwork
CRBM
CAutoencoder
CDeepAutoencoder
and also the unit-tests related to the above.
Your views?
@lisitsyn sergey?
@sanuj sorry.
CDotFeatures is abstract class for sure. Inheriting from it means that these features can compute dot product with some arbitrary vector. It doesn't restrict any internal structure of data, anything like that. I have some not-yet-merged thing in PR #2788 so I am unsure how to proceed.
@lisitsyn I see. Then we can have a look at this later. I'll start working on something else ;)
Currently neural nets require DenseFeatures input which restricts its usage a bit.
There is a good idea by @sonney2k (see COFFIN paper) that most of the things depend on dot product of feature vector with some weight vector. This enables magic training on out-of-core data like local or remote files and even data generated on the fly. This task proposes to use input of CDotFeatures instead of CDenseFeatures as a general idea, but concrete suggestions are welcome!
Good for Deep learning applicants