markovmodel / PyEMMA

đźš‚ Python API for Emma's Markov Model Algorithms đźš‚
http://pyemma.org
GNU Lesser General Public License v3.0
311 stars 119 forks source link

[msm.estimate_markov_model] parameter sparse=False gets overriden by method=auto of [msm/estimation/api/tmatrix] #404

Closed gph82 closed 4 years ago

gph82 commented 9 years ago

Somehow connected with #403

gph82 commented 9 years ago

One can, however, override this via **kwargs of [msm.estimate_markov_model]. Perhaps an indication in the docstring of estimate... saying that the parameter sparse=False might be ultimately overridden?

I mean something like: MSM = pyemma.msm.estimate_markov_model(dtrajs, 10, method='dense') actually enforces the computation to be done in dense mode

marscher commented 6 years ago

what exactly is the point here?

gph82 commented 6 years ago

It was not clear what optarg "sparse" does, exactly. "sparse" is documented as affecting the whole matrix computation (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L81).

However, by default everything is computed sparsely by _dtraj_stats.DiscreteTrajectoryStats (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/_dtraj_stats.py#L92), and the code of _estimate seems to go with that default behaviour when calling the wrapper function _get_dtraj_stats (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L1040).

The code has changed since 2015 (!), so I might be off here...but the only thing "sparse" is doing seems to be a) warn the user if sparse=True has not been used in large systems (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L167) b) convert sparsely computed objects to dense objects before returning them (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L1337)

marscher commented 6 years ago

On 01/17/2018 11:38 AM, Guillermo Pérez-Hernández wrote:

It was not clear what optarg "sparse" does, exactly. "sparse" is documented as affecting the whole matrix computation (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L81).

However, by default everything is computed sparsely by _dtraj_stats.DiscreteTrajectoryStats (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/_dtraj_stats.py#L92), and the code of _estimate seems to go with that default behaviour when calling the wrapper function _get_dtraj_stats (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L1040).

The code has changed since 2015 (!), so I might be off here...but the only thing "sparse" is doing seems to be a) warn the user if sparse=True has not been used in large systems (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L167) b) convert sparsely computed objects to dense objects before returning them (https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L1337)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/404#issuecomment-358265088, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKZL9Ldxu1cNKQD6uQrLch2t47-5e_Jks5tLc21gaJpZM4FVY-P.

You are right with all of our given points. Because dtrajstats always perform sparse counting, the sparse estimation routine will be called. We should convert the count matrix to dense if sparse=False prior invoking the estimation, not afterwards.

gph82 commented 6 years ago

well, IDK about that.

Perhaps it IS intended for matrix computation to be always sparse (has worked so far, so why change it?).

Maybe just changing the docstring (and some of the code comments) would be enough to clarify what sparse is actually doing.

marscher commented 6 years ago

I don't think that this intended. Everywhere in the code are comments spread, that the sparse keyword controls whether to use sparse matrix routines or not. For small systems it should be more efficient to use dense matrices, because sparse matrices involve more overhead.

marscher commented 6 years ago

On 01/17/2018 12:24 PM, Guillermo Pérez-Hernández wrote:

well, IDK about that.

Perhaps it IS intended for matrix computation to be always sparse (has worked so far, so why change it?).

Maybe just changing the docstring (and some of the code comments) would be enough to clarify what sparse is actually doing.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/404#issuecomment-358276048, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKZLyaWmQ74x4JRVnlIomRRljPCvXgFks5tLdiDgaJpZM4FVY-P.

I think @fnueske refactored this code the last time. I think the conversion of the count matrices should be done prior estimation. Do we agree upon this?

https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L1101

marscher commented 6 years ago

On 01/17/2018 12:24 PM, Guillermo Pérez-Hernández wrote:

well, IDK about that.

Perhaps it IS intended for matrix computation to be always sparse (has worked so far, so why change it?).

Maybe just changing the docstring (and some of the code comments) would be enough to clarify what sparse is actually doing.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/404#issuecomment-358276048, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKZLyaWmQ74x4JRVnlIomRRljPCvXgFks5tLdiDgaJpZM4FVY-P.

In addition to my argument, it seems @psolsson also agrees upon that, because he also converts the counts prior estimation:

https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L1745

fnueske commented 6 years ago

I don't think I made any changes to the way dense / sparse matrices are used here. I agree that the docstring suggests that dense matrices are used throughout if sparse is set to False, which is not what is happening. The most elegant solution would probably be if the count matrix estimator determines whether it is best to use sparse or dense matrices. All subsequent calculations would use the same format, and the output would finally be converted according to the user's preference. But I guess implementing this is a good deal of work.

marscher commented 6 years ago

https://github.com/markovmodel/PyEMMA/commit/fb858d97347da7fe73103668e2276c584f93677b

Agreed upon this elegant solution, though.

marscher commented 6 years ago

sorry to have blamed you on this @fnueske, the code already was in this state, when you split up the base class for msm estimation.

marscher commented 6 years ago

The comment in the MSMestimator was false regarding that dense input will also lead to dense computation (same for sparse), because msmtools.estimation.tmatrix has another flag called method (one of 'auto', 'sparse', 'dense'). The default for method is 'auto', so regardless of the input there is another heuristic to choose the outcome. So this one is not fixed by #1225, since we still have the automatic choosing of the computation method.

as @gph82 suggested, we can also leave this at it is and change the docstring to avoid confusion.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.