Open qwerfdsadad opened 4 months ago
Hi. Thank you for your kind report. This could be originated from pmap which split batch dimension from (N_b, ...) into (N_GPU, N_b/N_GPU, ... ). Please try to reshape the resulting file batch dimension from the latter one to the original batch number. In addition, our forward script does not allow us to use multi-GPUs, so please only use 1-GPU for training.
Reference
Thanks for your reply.
vm_evolve = jax.pmap(jax.vmap(evolve, axis_name='j'), axis_name='i')
local_devices = jax.local_device_count()
uu = vm_evolve(u.reshape([local_devices, cfg.multi.numbers//local_devices, -1]))
save_dim=[cfg.multi.numbers]+list(uu.shape[-2:])
uu_reshape=uu.reshape(save_dim)
jnp.save(cwd+cfg.multi.save+'1D_Advection_Sols_beta'+str(beta)[:5], uu_reshape)
This is mt solution. For the Advection-1D data set, I created a new variable uu_reshape to change the original uu shape. However, this method is not applicable to different dimensions and different data sets. The variable save_dim needs to be assigned a value for different data sets. Is there a unified approach?
Hello! This work of yours has been a strong support to drive innovation in machine learning simulation and I thank you for your contribution. I was recently studying your project code.
I went to the /data_gen_NLE/ReactionDiffusionEq/ folder to generate the Reaction Diffusion dataset and went to the /pdebench/models/ folder to run run_forward_1D.sh to train the network. The command to run is:
Then, I encountered this bug.![image](https://github.com/pdebench/PDEBench/assets/118503332/c5f7d7bd-ffd3-4dde-a045-2a228fa0b3e5)
Similarly, I went to the /data_gen_NLE/BurgersEq/ folder to generate the burgers dataset and then trained the network with the command,
and encountered a similar bug.![image](https://github.com/pdebench/PDEBench/assets/118503332/5a6082a6-b12c-443a-ba13-44da637d1eba)
But, I used the dataset downloaded from the /pdebench/data_download/ directory for testing and the program was able to run successfully.
I wonder if it is a problem with the HDF5 file. I use the HDFView to check the Data format.
The config file to generate 1D_Burgers_Sols_Nu1.0.hdf5 files is![image](https://github.com/pdebench/PDEBench/assets/118503332/c18a1106-52ce-4178-a15f-19b508a7420a)