Closed gph82 closed 4 years ago
One can, however, override this via **kwargs of [msm.estimate_markov_model]. Perhaps an indication in the docstring of estimate... saying that the parameter sparse=False might be ultimately overridden?
I mean something like:
MSM = pyemma.msm.estimate_markov_model(dtrajs, 10, method='dense')
actually enforces the computation to be done in dense mode
what exactly is the point here?
It was not clear what optarg "sparse" does, exactly. "sparse" is documented as affecting the whole matrix computation (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L81).
However, by default everything is computed sparsely by _dtraj_stats.DiscreteTrajectoryStats (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/_dtraj_stats.py#L92), and the code of _estimate seems to go with that default behaviour when calling the wrapper function _get_dtraj_stats (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L1040).
The code has changed since 2015 (!), so I might be off here...but the only thing "sparse" is doing seems to be a) warn the user if sparse=True has not been used in large systems (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L167) b) convert sparsely computed objects to dense objects before returning them (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L1337)
On 01/17/2018 11:38 AM, Guillermo Pérez-Hernández wrote:
It was not clear what optarg "sparse" does, exactly. "sparse" is documented as affecting the whole matrix computation (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L81).
However, by default everything is computed sparsely by _dtraj_stats.DiscreteTrajectoryStats (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/_dtraj_stats.py#L92), and the code of _estimate seems to go with that default behaviour when calling the wrapper function _get_dtraj_stats (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L1040).
The code has changed since 2015 (!), so I might be off here...but the only thing "sparse" is doing seems to be a) warn the user if sparse=True has not been used in large systems (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L167) b) convert sparsely computed objects to dense objects before returning them (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L1337)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/404#issuecomment-358265088, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKZL9Ldxu1cNKQD6uQrLch2t47-5e_Jks5tLc21gaJpZM4FVY-P.
You are right with all of our given points. Because dtrajstats always perform sparse counting, the sparse estimation routine will be called. We should convert the count matrix to dense if sparse=False prior invoking the estimation, not afterwards.
well, IDK about that.
Perhaps it IS intended for matrix computation to be always sparse (has worked so far, so why change it?).
Maybe just changing the docstring (and some of the code comments) would be enough to clarify what sparse is actually doing.
I don't think that this intended. Everywhere in the code are comments spread, that the sparse keyword controls whether to use sparse matrix routines or not. For small systems it should be more efficient to use dense matrices, because sparse matrices involve more overhead.
On 01/17/2018 12:24 PM, Guillermo Pérez-Hernández wrote:
well, IDK about that.
Perhaps it IS intended for matrix computation to be always sparse (has worked so far, so why change it?).
Maybe just changing the docstring (and some of the code comments) would be enough to clarify what sparse is actually doing.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/404#issuecomment-358276048, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKZLyaWmQ74x4JRVnlIomRRljPCvXgFks5tLdiDgaJpZM4FVY-P.
I think @fnueske refactored this code the last time. I think the conversion of the count matrices should be done prior estimation. Do we agree upon this?
On 01/17/2018 12:24 PM, Guillermo Pérez-Hernández wrote:
well, IDK about that.
Perhaps it IS intended for matrix computation to be always sparse (has worked so far, so why change it?).
Maybe just changing the docstring (and some of the code comments) would be enough to clarify what sparse is actually doing.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/404#issuecomment-358276048, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKZLyaWmQ74x4JRVnlIomRRljPCvXgFks5tLdiDgaJpZM4FVY-P.
In addition to my argument, it seems @psolsson also agrees upon that, because he also converts the counts prior estimation:
I don't think I made any changes to the way dense / sparse matrices are used here. I agree that the docstring suggests that dense matrices are used throughout if sparse is set to False, which is not what is happening. The most elegant solution would probably be if the count matrix estimator determines whether it is best to use sparse or dense matrices. All subsequent calculations would use the same format, and the output would finally be converted according to the user's preference. But I guess implementing this is a good deal of work.
https://github.com/markovmodel/PyEMMA/commit/fb858d97347da7fe73103668e2276c584f93677b
Agreed upon this elegant solution, though.
sorry to have blamed you on this @fnueske, the code already was in this state, when you split up the base class for msm estimation.
The comment in the MSMestimator was false regarding that dense input will also lead to dense computation (same for sparse), because msmtools.estimation.tmatrix has another flag called method (one of 'auto', 'sparse', 'dense'). The default for method is 'auto', so regardless of the input there is another heuristic to choose the outcome. So this one is not fixed by #1225, since we still have the automatic choosing of the computation method.
as @gph82 suggested, we can also leave this at it is and change the docstring to avoid confusion.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Somehow connected with #403