Dataset - Githubissues

Chen-Cai-OSU commented 3 years ago

Hello,

Thanks a lot for the nice code and documents. This is probably a naive question. I am trying to use your code to generate some data of Hamiltonian systems. I am a bit confused about the meaning of the arguments in

SHO(n_systems=300, chunk_len=10, dt=0.2, integration_time=300, regen=True).Zs

gives a np.array of size (300, 10, 2). I was wondering is n_systems stands for the number of particles? What is chunk_len? I expect the there is one dimension (for time) is of size 1500.

mfinzi commented 3 years ago

Hi Chen, Glad you're finding some use for our library!

n_systems is a bit poorly named here, it means the number of trajectories which are generated for the dataset (ie the size of the dataset N=300), which can also be thought of as a batch axis.

chunk_len=10 specifies that each training trajectory has 10 observation points which are spaced apart by dt=0.2 (dt is not the integrator timestep, which is effectively much lower and dependent on the tolerance). These trajectory chunks are generated by integrating over the longer time span integration_time=300s (or 30s as in the default) according to some randomly sampled initial conditions, and then selecting a random chunk_lendt = 10.2s=2s length segment, and then repeating until there are N of them. See https://github.com/mfinzi/equivariant-MLP/blob/master/experiments/trainer/hamiltonian_dynamics.py#L87 .

integration_time also serves a dual purpose as the length of time for the rollouts that we use for evaluation.

The final argument can be used to cache the generated dataset to disk (if regen=False) or regenerate the data and overwrite anything that was on disk (regen=True). For the arguments you specified, the 'Zs' would be saved at ~/datasets/ODEDynamics/trajectories_300_10_0.2_300.pz. Probably I should add an argument so you can specify the base directory.

The shape of Zs (300,10,2) is (N,chunk_len,state_dim) where for the Simple Harmonic Oscillator (SHO) the state dimension is 2: one for x and one for its conjugate momentum p.

For the double spring pendulum state_dim=12=223 (3 ambient dimensions for position vectors, 2 bobs connected by springs, and an extra 2 for describing position and momentum).

For training, I take the targets z of size z.shape =(bs,chunk_len,state_dim), get the initial conditions z[:,0,:], use the learned dynamics model and ODE integrator to produce predictions at the chunked timepoints T=np.arange(0,chunk_len*dt,dt) and compute the MSE between those predictions z_pred and the targets z.

You may also find it useful to visualize the dataset or rollout trajectories which you can do with the DoubleSpringPendulum dataset by the following code snippet in a jupyter notebook or colab:

from IPython.display import HTML HTML(DoubleSpringPendulum().animate())

or

HTML(DoubleSpringPendulum().animate(Zs))

I hope this has been helpful and feel free to reach out again if you have more questions. I will try to add some docstrings to make clear this less well documented part of the experiments.

Cheers, Marc

On Thu, Apr 22, 2021 at 7:12 PM Chen-Cai-OSU @.***> wrote:

Hello,

Thanks a lot for the nice code and documents. This is probably a simple question. I am trying to use your code to generate some data of Hamiltonian systems. I am a bit confused about the meaning of the arguments in

SHO(n_systems=300, chunk_len=10, dt=0.2, integration_time=300, regen=True).Zs

gives a np.array of size (300, 10, 2). I was wondering is n_systems stands for the number of particles? What is chunk_len? I expect the there is one dimension (for time) is of size 1500.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mfinzi/equivariant-MLP/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAZN3NDTART7GDNCAUWVGTTKCUMZANCNFSM43NQZ7CA .

mfinzi commented 3 years ago

Ok I just pushed some docstrings for these functions onto master.

On Thu, Apr 22, 2021 at 9:48 PM Marc Finzi @.***> wrote:

Hi Chen, Glad you're finding some use for our library!

n_systems is a bit poorly named here, it means the number of trajectories which are generated for the dataset (ie the size of the dataset N=300), which can also be thought of as a batch axis.

chunk_len=10 specifies that each training trajectory has 10 observation points which are spaced apart by dt=0.2 (dt is not the integrator timestep, which is effectively much lower and dependent on the tolerance). These trajectory chunks are generated by integrating over the longer time span integration_time=300s (or 30s as in the default) according to some randomly sampled initial conditions, and then selecting a random chunk_lendt = 10.2s=2s length segment, and then repeating until there are N of them. See https://github.com/mfinzi/equivariant-MLP/blob/master/experiments/trainer/hamiltonian_dynamics.py#L87 .

integration_time also serves a dual purpose as the length of time for the rollouts that we use for evaluation.

The final argument can be used to cache the generated dataset to disk (if regen=False) or regenerate the data and overwrite anything that was on disk (regen=True). For the arguments you specified, the 'Zs' would be saved at ~/datasets/ODEDynamics/trajectories_300_10_0.2_300.pz. Probably I should add an argument so you can specify the base directory.

The shape of Zs (300,10,2) is (N,chunk_len,state_dim) where for the Simple Harmonic Oscillator (SHO) the state dimension is 2: one for x and one for its conjugate momentum p.

For the double spring pendulum state_dim=12=223 (3 ambient dimensions for position vectors, 2 bobs connected by springs, and an extra 2 for describing position and momentum).

For training, I take the targets z of size z.shape =(bs,chunk_len,state_dim), get the initial conditions z[:,0,:], use the learned dynamics model and ODE integrator to produce predictions at the chunked timepoints T=np.arange(0,chunk_len*dt,dt) and compute the MSE between those predictions z_pred and the targets z.

You may also find it useful to visualize the dataset or rollout trajectories which you can do with the DoubleSpringPendulum dataset by the following code snippet in a jupyter notebook or colab:

from IPython.display import HTML HTML(DoubleSpringPendulum().animate())

or

HTML(DoubleSpringPendulum().animate(Zs))

I hope this has been helpful and feel free to reach out again if you have more questions. I will try to add some docstrings to make clear this less well documented part of the experiments.

Cheers, Marc

On Thu, Apr 22, 2021 at 7:12 PM Chen-Cai-OSU @.***> wrote:

Hello,

Thanks a lot for the nice code and documents. This is probably a simple question. I am trying to use your code to generate some data of Hamiltonian systems. I am a bit confused about the meaning of the arguments in

SHO(n_systems=300, chunk_len=10, dt=0.2, integration_time=300, regen=True).Zs

gives a np.array of size (300, 10, 2). I was wondering is n_systems stands for the number of particles? What is chunk_len? I expect the there is one dimension (for time) is of size 1500.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mfinzi/equivariant-MLP/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAZN3NDTART7GDNCAUWVGTTKCUMZANCNFSM43NQZ7CA .

Chen-Cai-OSU commented 3 years ago

Hi @mfinzi

Thanks a lot for the detailed explanation. I am able to achieve my goal now. Thank you for the help!

mfinzi / equivariant-MLP

Dataset #4