spcl / dace

DaCe - Data Centric Parallel Programming
http://dace.is/fast
BSD 3-Clause "New" or "Revised" License
491 stars 125 forks source link

ConstantPropagation transformation pass leads to invalid numerics #1328

Open FlorianDeconinck opened 1 year ago

FlorianDeconinck commented 1 year ago

Describe the bug Running the regression test of the Pace climate and weather model with close-to HEAD version of DaCe shows that the ConstantPropagation lead to a failure on 3 variable which seems to not be written at all. Removing the pass from the simplify call fixes the problem - leading to a full validation of the model.

To Reproduce This requires to run DaCe commit 2daf22768581655200d27a8ed9755b8eb9f890a7 or newer. Either dace:cpu or dace:gpu can be ran. The below script runs dace:cpu

The reproducer the acoustics sub-module of the model. It runs as a pytest test and will test numerical results against the original Fortran code. The test either pass or report of the failing variables.

# Repo is to run the AcousticDynamics regression test
# Original code: fv3core/pace/fv3core/stencils/dyn_core.py
# DaCe is applied on the AcousticDynamics .__call__ function

# Get Pace repository
git clone git@github.com:GEOS-ESM/pace
cd pace
git checkout ca1796f8064f21821a9945fc6c4ad655078247a8
git submodule init
git submodule update

# Setup the venv, including Cupy for GPU
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install external/gt4py/
pip install -r requirements_dev.txt -c constraints.txt

# Download data
mkdir -p test_data/8.1.3/c12_6ranks_standard/dycore
cd test_data/8.1.3/c12_6ranks_standard/dycore
pip install gdown
gdown https://drive.google.com/uc?id=1p4GvdSof10BYTV9-sYTTWcLkX9RScQJn
gdown https://drive.google.com/uc?id=1a5TbVqFmqmVX5qzGMFnPy3xn7Qj0by8j
gdown https://drive.google.com/uc?id=1SOO97ncz-fCGVoPD7pUuYY9uYowUjato
gdown https://drive.google.com/uc?id=1Wcb1l7GXE5C_82oItGo7RkKloWlJCKR4
cd -

# Run test of FvTp2d
export FV3_DACEMODE=BuildAndRun 
export PACE_CONSTANTS=GFS
mpirun -np 6 \
       pytest -v -s --data_path=./test_data/8.1.3/c12_6ranks_standard/dycore \
       -m parallel \
       --backend=dace:cpu --which_modules=DynCore --which_rank=0 \
       --threshold_overrides_file=./fv3core/tests/savepoint/translate/overrides/standard.yaml \
       ./fv3core/tests/savepoint

Expected behavior Passing test. This runs exactly one test - all the other tests of the suite will be skipped.

A few more explanation The Pace repository applies a somewhat custom pipeline regarding DaCe. It can be seen in orchestration.py:_build_sdfg. But the gist of it is that we have first an SDFG with un-expanded library node coming from a GT4Py pass on the python.

FlorianDeconinck commented 1 year ago

Follow up to https://github.com/spcl/dace/issues/1306 and https://github.com/spcl/dace/issues/1305

FlorianDeconinck commented 1 year ago

This is not blocking for now, we skip the pass. Also, crude benchmark doesn't show performance drop.