opera-adt / DSWX-SAR

Dynamic Surface Water Extent from Synthetic Aperture Radar
Apache License 2.0
9 stars 6 forks source link

Product values differ between Intel and AMD platforms #79

Open collinss-jpl opened 3 months ago

collinss-jpl commented 3 months ago

When running the DSWx-NI SAS Interface delivery on an Intel-based EC2 instance (for example c6i.2xlarge), the comparison of output and expected products using the dswx_comparison.py script produces the following comparison failures:

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B01_WTR.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B01_WTR.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Water classification (WTR)"
/data/home/collinss/OPERA/DSWx-NI/interface/0.1/dswx_comparison.py:59: RuntimeWarning: overflow encountered in ubyte_scalars
  if (abs(image_1[i, j] - image_2[i, j]) >=
            * input 1 has value "0" in position (x: 2615, y: 460) whereas input 2 has value "1" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B02_BWTR.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B02_BWTR.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Binary Water classification (BWTR)"
/data/home/collinss/OPERA/DSWx-NI/interface/0.1/dswx_comparison.py:59: RuntimeWarning: overflow encountered in ubyte_scalars
  if (abs(image_1[i, j] - image_2[i, j]) >=
            * input 1 has value "0" in position (x: 2615, y: 460) whereas input 2 has value "1" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B03_CONF.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B03_CONF.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Confidence values (CONF)"
            * input 1 has value "2" in position (x: 2610, y: 313) whereas input 2 has value "1" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B04_DIAG.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B04_DIAG.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Diagnostic layer (DIAG)"
/data/home/collinss/OPERA/DSWx-NI/interface/0.1/dswx_comparison.py:59: RuntimeWarning: overflow encountered in ubyte_scalars
  if (abs(image_1[i, j] - image_2[i, j]) >=
            * input 1 has value "99" in position (x: 2527, y: 0) whereas input 2 has value "100" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B01_WTR.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B01_WTR.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Water classification (WTR)"
            * input 1 has value "1" in position (x: 2260, y: 68) whereas input 2 has value "0" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B02_BWTR.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B02_BWTR.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Binary Water classification (BWTR)"
            * input 1 has value "1" in position (x: 2260, y: 68) whereas input 2 has value "0" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B03_CONF.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B03_CONF.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Confidence values (CONF)"
            * input 1 has value "6" in position (x: 3617, y: 16) whereas input 2 has value "0" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B04_DIAG.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B04_DIAG.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Diagnostic layer (DIAG)"
            * input 1 has value "54" in position (x: 2046, y: 0) whereas input 2 has value "51" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SMS_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B04_DIAG.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SMS_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B04_DIAG.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Diagnostic layer (DIAG)"
/data/home/collinss/OPERA/DSWx-NI/interface/0.1/dswx_comparison.py:59: RuntimeWarning: overflow encountered in ubyte_scalars
  if (abs(image_1[i, j] - image_2[i, j]) >=
            * input 1 has value "89" in position (x: 33, y: 0) whereas input 2 has value "90" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B01_WTR.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B01_WTR.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Water classification (WTR)"
            * input 1 has value "1" in position (x: 343, y: 338) whereas input 2 has value "0" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B02_BWTR.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B02_BWTR.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Binary Water classification (BWTR)"
            * input 1 has value "1" in position (x: 343, y: 338) whereas input 2 has value "0" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B03_CONF.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B03_CONF.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Confidence values (CONF)"
            * input 1 has value "6" in position (x: 285, y: 16) whereas input 2 has value "0" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B04_DIAG.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B04_DIAG.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Diagnostic layer (DIAG)"
            * input 1 has value "50" in position (x: 25, y: 0) whereas input 2 has value "47" in the same position.

When running the same test and comparisons on an AMD-based EC2 instance (c6a.2xlarge), all tests pass cleanly. This indicates that the DSWX-SAR code is susceptible to floating point precision/rounding errors between Intel and AMD, giving slightly different (incorrect?) values on Intel machines. Note that similar behavior has also been observed when running the DSWx-S1 SAS.

This is a potential issue since we sometimes allocate both Intel and AMD instance types in the same auto-scaling worker pool in the OPERA SDS.

A set of sample DSWx-NI outputs generated on an Intel instance can be downloaded from s3://opera-dev-lts-fwd-collinss/acceptance_test/dswx_ni/interface_0.1/