m2lines / gz21_ocean_momentum

Stochastic-Deep Learning Parameterization of Ocean Momentum Forcing
MIT License
5 stars 1 forks source link

Readme invocations result in `0 x 45` input size in training stage #75

Closed raehik closed 1 year ago

raehik commented 1 year ago

Preparing data with the command from the readme:

mlflow run . --experiment-name raehik --env-manager=local \         
-P lat_min=-25 -P lat_max=25 -P long_min=-280 -P long_max=80 -P factor=4 -P chunk_size=1 -P CO2=1 -P global=0 -P ntimes=100

Then using the following command to train, again from the readme:

mlflow run . --experiment-name raehik -e train --env-manager=local \
-P exp_id=835131858780593246 -P run_id=b87a10f0ced44f948227cb2153a6b406 \
-P learning_rate=0/5e-4/15/5e-5/30/5e-6 -P n_epochs=200 -P weight_decay=0.00 -P train_split=0.8 \
-P test_split=0.85 -P model_module_name=models.models1 -P model_cls_name=FullyCNN -P batchsize=4 \
-P transformation_cls_name=SoftPlusTransform -P submodel=transform3 \
-P loss_cls_name=HeteroskedasticGaussianLossV2

Results in a kernel size mismatch error:

RuntimeError: Calculated padded input size per channel: (0 x 45). Kernel size: (5 x 5). Kernel size can't be greater than actual input size

(This is still using model1, not the refactored Pytorch model.)

mondus commented 1 year ago

@raehik Suspects that this may be due to an issue with how we label data. Hoping that @arthurBarthe can clarify in the next development meeting.

raehik commented 1 year ago

Turns out it's to do with the subdomain regions in training_subdomains.yaml -- some weren't overlapping with the region of the processed data. lat_min=-80 lat_max=80 works with existing subdomain regions.

arthurBarthe commented 1 year ago

Yes, just to clarify a bit more for others, training_subdomains.yaml contains the definitions of the 4 regions used for training (in the paper we do not use the global data for training but only 4 selected regions. We need the processed data to contain those 4 regions, which is why we were getting a bug with lat_min=-25 and lat_max=25. I will add info about training_subdomains in the Readme if it is not there.

raehik commented 1 year ago

Updated default data processing command and added a brief note in readme 6958f5e6b4fbd1a86e2b5a3147b55c6355b64ec8 .