pdebench / PDEBench

PDEBench: An Extensive Benchmark for Scientific Machine Learning
Other
680 stars 78 forks source link

Unable to synchronously open object (object 'nu' doesn't exist) #56

Open qwerfdsadad opened 4 months ago

qwerfdsadad commented 4 months ago

Hello! This work of yours has been a strong support to drive innovation in machine learning simulation and I thank you for your contribution. I was recently studying your project code.

I went to the /data_gen_NLE/ReactionDiffusionEq/ folder to generate the Reaction Diffusion dataset and went to the /pdebench/models/ folder to run run_forward_1D.sh to train the network. The command to run is:

CUDA_VISIBLE_DEVICES='0' python3 train_models_forward.py +args=config_ReacDiff.yaml ++args.filename='ReacDiff_Nu1.0_Rho1.0.hdf5' ++args.model_name='FNO'

Then, I encountered this bug. image

Similarly, I went to the /data_gen_NLE/BurgersEq/ folder to generate the burgers dataset and then trained the network with the command,

 CUDA_VISIBLE_DEVICES='2,3' python3 train_models_forward.py +args=config_Bgs.yaml ++args.filename='1D_Burgers_Sols_Nu1.0.hdf5' ++args.model_name='FNO'

and encountered a similar bug. image

But, I used the dataset downloaded from the /pdebench/data_download/ directory for testing and the program was able to run successfully.

I wonder if it is a problem with the HDF5 file. I use the HDFView to check the Data format.

image I found that the t-axis coordinate has 202 points(form 0 to 2.01) and the x-axis has 1024 points(form 0 to 1), but the tensor is a 2*5000 data format.

The config file to generate 1D_Burgers_Sols_Nu1.0.hdf5 files is image

mtakamoto-D commented 4 months ago

Hi. Thank you for your kind report. This could be originated from pmap which split batch dimension from (N_b, ...) into (N_GPU, N_b/N_GPU, ... ). Please try to reshape the resulting file batch dimension from the latter one to the original batch number. In addition, our forward script does not allow us to use multi-GPUs, so please only use 1-GPU for training.

qwerfdsadad commented 3 months ago

Reference

Thanks for your reply.

    vm_evolve = jax.pmap(jax.vmap(evolve, axis_name='j'), axis_name='i')
    local_devices = jax.local_device_count()
    uu = vm_evolve(u.reshape([local_devices, cfg.multi.numbers//local_devices, -1]))
    save_dim=[cfg.multi.numbers]+list(uu.shape[-2:])
    uu_reshape=uu.reshape(save_dim)
    jnp.save(cwd+cfg.multi.save+'1D_Advection_Sols_beta'+str(beta)[:5], uu_reshape)

This is mt solution. For the Advection-1D data set, I created a new variable uu_reshape to change the original uu shape. However, this method is not applicable to different dimensions and different data sets. The variable save_dim needs to be assigned a value for different data sets. Is there a unified approach?