Closed ajkluber closed 9 years ago
Hi Alex,
Please go ahead it looks like great improvements, let me know if you have any question when doing the changes,
Best Jordane
On Sun, Apr 12, 2015 at 5:10 PM, Alexander Kluber notifications@github.com wrote:
I want to propose a couple improvements that might help in applications to larger matrices.
- numpy's save (load) instead of savetxt (loadtxt), much faster to write/read from binary format.
- mpi4py's uppercase methods comm.Scatter, comm.Gather, etc. (versus their lowercase counterparts). The lowercase methods use pickle to serialize the inputs for communication and this limits them to communicating objects <=2GB. On the other hand the uppercase methods are intended for communicating numpy arrays.
I'm willing to work on making these changes, at least in the lsdmap subpackage.
The reason I am proposing these changes is that I am trying to compute lsdmap on a trajectory of 1.5E6 frames by a combination of downsampling, lsdmap, and embedding using rbf.
Reply to this email directly or view it on GitHub https://github.com/jp43/lsdmap/issues/1.
Jordane PRETO
Rice University, Anderson Biological Lab, room 319 6100 Main street Houston, Texas, 77005-1892
Well, one awkward thing about np.save/np.load is that if you append a file (e.g. saving distance_matrix) then you have to call np.load the same number of times: each call np.load returns a chunk that you appended.
This would require the same number of processors when later loading the distance matrix as when it was saved. Is this worth it?
As a crude comparison, for a trajectory of 50,000 frames: np.loadtxt("example.dm") took 70min; filesize is 59GB np.save("example_dm.npy",dm) took 1min; filesize is 19GB
I want to propose a couple improvements that might help in applications to larger matrices.
I'm willing to work on making these changes, at least in the lsdmap subpackage.
The reason I am proposing these changes is that I am trying to compute lsdmap on a trajectory of 1.5E6 frames by a combination of downsampling, lsdmap, and embedding using rbf.