Open duchesnay opened 7 years ago
This is a great idea! I think we should use it!
I have some comments: I think we should add a parameter (e.g., a bool use_cache
) to the constructor of the estimator and the loss function that will use this, so that it can be turned off by those that know what it means to turn it off.
We can add a TODO-comment near the sha1 computation so that we can think about clever ways to speed up the hash computation. For instance, it might be unnecessarily slow to recompute the sha1 hash of X
every time if we know that only the hash of beta
has changed.
There is also another really simple solution to this problem: In functions.properties.Gradient
, there is a non-abstract function f_grad
, that can be used instead of calling grad
and f
separately. This means that np.dot(X, beta)
can be computed only once in f_grad
instead of twice, as when calling first grad
and then f
. This requires a minor change in the algorithms, though: Check once if f_grad
is implemented, and if it is, use it if we need to compute both f
and grad
simultaneously. This is trivial, but requires some extra logic in every algorithm.
+1 for use_cache
The checkargs
indicates which argument should be checked (with sha1 hash), to avoid unnecessary and slow check of other arguments such X
Using f_grad
instead of f
+ grad
is interesting. However, we still have np.dot(X, beta)
in the gap... Moreover, this will impact the logic in every algorithm...
80% of the computation time is spent in the
np.dot(X, beta)
of the loss and it is done twice in the gradient and in the gap inside the same iteration. We could cache this result. Memory cache solution such as joblib exists (theMemory class
) but it provides complex and unnecessary features:, ie, storing on disk, long term memorizing that will slow down the performances. We only need a "short" time memory with only need a one step back cache. Here are my proposed specifications:The pros arguments are
np.dot(X, beta)
.LinearRegressionL1L2TV
, etc. ). The cache instance will be given to the loss. Then, only a few lines where np.dot(X, beta) will be touched.Bellow an example of implementation