markovmodel / thermotools

Winter is coming...
spdy://probably.not.this.year
GNU Lesser General Public License v3.0
12 stars 4 forks source link

Sparsity: design decision needed #1

Closed cwehmeyer closed 8 years ago

cwehmeyer commented 9 years ago

We currently use numpy.ndarrays with shape=(T, M, M) for dense count matrices. Scipy.sparse unfortunately supports only 2D matrices.

We could replace the 3D structure by a list of sparse 2D matrices for the thermodynamic states and join the (row, column, value) arrays inside the cython wrapper into a (therm_state, row, column, value)-like arrangement before passing it to the C level. This would, however, lead to different function interfaces (list of 2D vs. 3D) for the sparse and dense variants.

How are we handling this?

franknoe commented 9 years ago

I think a list of 2D matrices is fine in general. If (inside the function) you access indexes by A[k][i,j], that would both work with a numpy 3D tensor and a list of 2D numpy arrays. It would additionally work with a list of scipy sparse matrices if they are indexable (e.g. csr).

Why would the function interfaces be different? Python is type agnostic, so it's possible to have two different types for the same input. We just shouldn't overdo this because it might create confusion.

Am 16/09/15 um 10:44 schrieb Christoph Wehmeyer:

We currently use numpy.ndarrays with shape=(T, M, M) for dense count matrices. Scipy.sparse unfortunately supports only 2D matrices.

We could replace the 3D structure by a list of sparse 2D matrices for the thermodynamic states and join the (row, column, value) arrays inside the cython wrapper into a (therm_state, row, column, value)-like arrangement before passing it to the C level. This would, however, lead to different function interfaces (list of 2D vs. 3D) for the sparse and dense variants.

How are we handling this?

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/thermotools/issues/1.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 9 years ago

I think @cwehmeyer is referring to the C implementatio which is not type-agnostic. So if we decide for an implementation this would exclude certain types. At the interface level we could always convert types. For example we could have a (coo) sparse C-level implementation and always convert to coo-sparse.

Then there is the question, what kind of sparse matrix representation we like to support at the C-level. The coo format comes to mind first. But we should have a look at the other formats too just to be sure that we don't miss a better option.

franknoe commented 9 years ago

I see. We would have two distinct C codes for dense and sparse (see msmtools)

Am 17/09/15 um 10:02 schrieb fabian-paul:

I think @cwehmeyer https://github.com/cwehmeyer is referring to the C implementatio which is not type-agnostic. So if we decide for an implementation this would exclude certain types. At the interface level we could always convert types. For example we could have a (coo) sparse C-level implementation and always convert to coo-sparse.

Then there is the question, what kind of sparse matrix representation we like to support at the C-level. The coo format comes to mind first. But we should have a look at the other formats too just to be sure that we don't miss a better option.

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/thermotools/issues/1#issuecomment-140997700.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 9 years ago

So should the interface be type-agnostic or not? (Edit: in msmtools we have both, the agnostic interface in the API and the type-specific interfaces in the dense/sparse subpackages. Should it be the same for thermotools?)

franknoe commented 9 years ago

The python API interface need to admit either sparse or dense types. Depending on the input type you call into either dense or sparse C-functions. As in msmtools.estimation.transition_matrix. Does this answer your question?

Am 17/09/15 um 10:15 schrieb fabian-paul:

So should the interface be type-agnostic or not?

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/thermotools/issues/1#issuecomment-141000074.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 9 years ago

Yes

franknoe commented 9 years ago

BTW, I really like "winter is coming..." as a project description. Gives me lots of ideas for a logo ;-)

Am 17/09/15 um 11:37 schrieb fabian-paul:

Yes

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/thermotools/issues/1#issuecomment-141024125.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany