lautenberger / elmfire

Eulerian Level set Model of FIRE spread
https://elmfire.io
Eclipse Public License 2.0
23 stars 11 forks source link

Fortran runtime error: Bad real number in item 1 of list input #34

Open troysalt opened 9 months ago

troysalt commented 9 months ago

Hello,

When running ELMFIRE 2023.06, I'm encountering an error shortly after the run starts.

elmfire-error

The error points back to line 1700 of elmfire_io.f90. That line is reading the no data value from the input rasters. I figured that some of my input rasters must not have a no data value set, or that the value was 'n/a', hence "bad real number". I tried to fix this by updating all of my input rasters to ensure they have no data set to -9999 and are in the data type specified in the user guide.

If I traced things back correctly, it looks like the file with the error is wd.tif. Odd because it looks like ws.tif did not encounter this error. ws.tif and wd.tif were identical except wd.tif was converted from int16 to float32. If it helps, the source data is gridMET processed and converted to TIFF using xarray and rioxarray.

Thank you!

lautenberger commented 9 months ago

Hi Troy - thanks for submitting an issue. Yes, ELMFIRE can encounter problems reading rasters when NODATA is not set. It also expects wind speed and direction to be Float32 and not Int16.

Did making those changes get you around the issue or are you still stuck? If the latter, please attach or give a link to the complete input deck and I'll run in a debugger on my end to figure out what's going on. Thanks!

Chris

troysalt commented 9 months ago

Hi Chris - yes I did make those changes.

Here's a link to the input data: https://drive.google.com/file/d/18smJOk4_88SU3uX5_OjPbPTdd3bdv19k/view?usp=sharing

Thank you for your help!

lautenberger commented 9 months ago

I was able to reproduce the issue on my end. What happened is that bits of metadata embedded the ws.tif file made their way to the header of the ws.bsq file that ELMFIRE ultimately reads in, and ELMFIRE was unable to parse the header.

There's an easy fix though - you just have to scrub the embedded metadata from the input GeoTiffs. There's a few ways to do that but the approach I took was to rename the inputs directory inputs_in and add this code block right below the OUTPUTS=./outputs line in 01-run.sh:

rm -f -r $INPUTS
mkdir $INPUTS
for f in ./inputs_in/*.tif; do
   gdal_calc.py -A $f --allBands=A --NoDataValue=-9999 --calc="A*1.0" --outfile=$INPUTS/`basename $f` >& /dev/null &
done
wait

That way the files that land in ./inputs have no metadata. ELMFIRE is then able to successfully read in the fuels/topo/weather rasters.

troysalt commented 9 months ago

That fixed it! However another issue arose.

elmfire-error2

It looks like maybe one of the file names is empty on a gdal_merge command and "./outputs/.tif" gets passed. I traced this to line 2616 in elmfire_io.f90 (2023.06). Any ideas?

lautenberger commented 9 months ago

If you're generating lots of raster outputs it's probably better to do the conversion to GeoTiff outside of ELMFIRE, basically by issuing the same gdal_translate commands as in the window above but in parallel. This is what I do about 99% of the time that I use ELMFIRE.

Try setting CONVERT_TO_GEOTIFF = .FALSE. in the &OUTPUTS namelist group and after ELMFIRE finishes, run a script something like this:

#!/bin/bash

NPARALLEL=32
SRS=`gdalsrsinfo ./inputs/asp.tif  | grep "PROJ.4" | cut -d':' -f2 | xargs`

compress () {
   local f=$1
   local OT=$2
   local NODATA=$3
   local STUB=`echo $f | cut -d. -f1`

   gdal_translate -a_srs "$SRS" -a_nodata $NODATA -ot $OT -co "TILED=yes" -co "COMPRESS=DEFLATE" -co "ZLEVEL=9" \
                   -co "NUM_THREADS=2" $STUB.bil $STUB.tif && rm -f $STUB.bil $STUB.hdr
}

cd outputs
N=0
for f in *.bil; do
   compress "$f" Float32 0 &
   let "N=N+1"
   if [ "$N" = "$NPARALLEL" ]; then
      N=0
      wait
   fi
done
wait

exit 0

This will create a bunch of individual GeoTiffs that you can subsequently stack into a multiband raster using gdal_merge.py -separate ...

troysalt commented 9 months ago

Thanks, now I don't get that error, but alas, there is another. Directly following "Determining random ignition locations", I get "Program received signal SIGSEGV: Segmentation fault - invalid memory reference."

I expected it might be something with the meteorology start/stop/skip in the &MONTE_CARLO group, but everything I've tried changing results in the same error. I have 30 meteorology bands (daily data) which I intend to simulate 10 groups of 3 days of weather. I have an ignition mask raster with probabilities and reviewed the user guide to make sure everything is set correctly.

Here's the &MONTE_CARLO group:

&MONTE_CARLO RANDOM_IGNITIONS = .TRUE. USE_IGNITION_MASK = .TRUE. RANDOM_IGNITIONS_TYPE = 2 NUM_ENSEMBLE_MEMBERS = 10000 METEOROLOGY_BAND_START = 1 METEOROLOGY_BAND_SKIP_INTERVAL = 4 METEOROLOGY_BAND_STOP = 28 /

lautenberger commented 9 months ago

I suspect the problem occurs when processing ./inputs/idg.tif. So let's try this:

  1. In &OUTPUTS set CALCULATE_TIMES_BURNED = .TRUE. and delete CALCULATE_BURN_PROBABILITY = .TRUE.. Although this is unrelated to your problem it directs ELMFIRE to output raw burn counts which is useful for debugging.
  2. In the ./inputs directory, replace idg.tif with adj.tif (meaning, backup idg.tif and then do a cp -f adj.tif idg.tif). This overwrites the ignition density grid with the spread rate adjustment factor raster which is all 1's.

On my end, doing this this allows the case to run to completion (at least with a smaller number of ensemble members). So if you're able to successfully run with the dummy ignition density raster from adj.tif that suggests the problem is in idg.tif. If that's the case make sure idg.tif is in the same projection and has the same extents, cellsize, and number of rows/columns as the fuel inputs. Then make sure its nodata value is set to -9999 and that there are no pixels < 0 in the raster (besides nodata) and try again with the "real" ignition density grid!

johnsonas6 commented 9 months ago

To tack onto this - I receive the same error following "Determining random ignition locations" when attempting to do a test burn probability with these inputs:

&MONTE_CARLO RANDOM_IGNITIONS = .TRUE. USE_IGNITION_MASK = .FALSE. PERCENT_OF_PIXELS_TO_IGNITE = 10 NUM_ENSEMBLE_MEMBERS = -1 ALLOW_MULTIPLE_IGNITIONS_AT_A_PIXEL = .FALSE. NUM_METEOROLOGY_TIMES = 8 /

This run uses the same 01-run.sh file and inputs as the 03-real-fuels tutorial.

lautenberger commented 9 months ago

OK, post your complete input deck and I'll take a look. If there's anything that you don't want posted here publicly you can email me a link at chris@cloudfire.ai.

johnsonas6 commented 9 months ago

Empty inputs should be filled in from 01-run.sh because the landscape data is pulled from cloudfire. Thanks for taking a look at this.

&INPUTS FUELS_AND_TOPOGRAPHY_DIRECTORY = './inputs' ASP_FILENAME = 'asp' CBD_FILENAME = 'cbd' CBH_FILENAME = 'cbh' CC_FILENAME = 'cc' CH_FILENAME = 'ch' DEM_FILENAME = 'dem' FBFM_FILENAME = 'fbfm40' SLP_FILENAME = 'slp' ADJ_FILENAME = 'adj' PHI_FILENAME = 'phi' DT_METEOROLOGY = 3600.0 WEATHER_DIRECTORY = './inputs' WS_FILENAME = 'ws' WD_FILENAME = 'wd' M1_FILENAME = 'm1' M10_FILENAME = 'm10' M100_FILENAME = 'm100' USE_CONSTANT_LH = .FALSE. MLH_FILENAME = 'lh' USE_CONSTANT_LW = .FALSE. MLW_FILENAME = 'lw' /

&OUTPUTS OUTPUTS_DIRECTORY = './outputs' DTDUMP = DUMP_FLIN = .TRUE. DUMP_SPREAD_RATE = .TRUE. DUMP_TIME_OF_ARRIVAL = .TRUE. CONVERT_TO_GEOTIFF = .FALSE. CALCULATE_TIMES_BURNED = .TRUE. /

&COMPUTATIONAL_DOMAIN A_SRS = 'EPSG: 32610' COMPUTATIONAL_DOMAIN_CELLSIZE = COMPUTATIONAL_DOMAIN_XLLCORNER = COMPUTATIONAL_DOMAIN_YLLCORNER = /

&TIME_CONTROL SIMULATION_DT = 1.0 TARGET_CFL = 0.2 SIMULATION_TSTOP = /

&MONTE_CARLO NUM_METEOROLOGY_TIMES = 8 RANDOM_IGNITIONS = .TRUE. USE_IGNITION_MASK = .FALSE. PERCENT_OF_PIXELS_TO_IGNITE = 10 NUM_ENSEMBLE_MEMBERS = -13 ALLOW_MULTIPLE_IGNITIONS_AT_A_PIXEL = .FALSE. /

&SIMULATOR NUM_IGNITIONS = X_IGN(1) = Y_IGN(1) = T_IGN(1) = /

&MISCELLANEOUS MISCELLANEOUS_INPUTS_DIRECTORY = './misc' FUEL_MODEL_FILE = 'fuel_models.csv' PATH_TO_GDAL = '/usr/bin' SCRATCH = './scratch'

johnsonas6 commented 9 months ago

Hi Chris,

I've realized that you may have meant all of my input files, in which case, here they are. Again, I do want to stress that these were all pulled directly from cloudfires. https://drive.google.com/drive/folders/1bZWmxapb1BopR1CBKafQXWeyiYP8_uCF?usp=drive_link

Thanks,

Andrew

troysalt commented 8 months ago

I suspect the problem occurs when processing ./inputs/idg.tif. So let's try this:

  1. In &OUTPUTS set CALCULATE_TIMES_BURNED = .TRUE. and delete CALCULATE_BURN_PROBABILITY = .TRUE.. Although this is unrelated to your problem it directs ELMFIRE to output raw burn counts which is useful for debugging.
  2. In the ./inputs directory, replace idg.tif with adj.tif (meaning, backup idg.tif and then do a cp -f adj.tif idg.tif). This overwrites the ignition density grid with the spread rate adjustment factor raster which is all 1's.

On my end, doing this this allows the case to run to completion (at least with a smaller number of ensemble members). So if you're able to successfully run with the dummy ignition density raster from adj.tif that suggests the problem is in idg.tif. If that's the case make sure idg.tif is in the same projection and has the same extents, cellsize, and number of rows/columns as the fuel inputs. Then make sure its nodata value is set to -9999 and that there are no pixels < 0 in the raster (besides nodata) and try again with the "real" ignition density grid!

I was able to run ELMFIRE by following your test case. I've ensured that the raster properties are exactly the same for the idg.tif as fuels inputs. I'm able to run ELMFIRE with my original idg.tif by following step 1 and not step 2. The fire_size_stats.csv output shows that ignitions did use the ignition density grid to constrain ignitions where values were 0 and influenced placement where values were >0. If I replace CALCULATE_TIMES_BURNED = .TRUE. with CALCULATE_BURN_PROBABILITY = .TRUE., ELMFIRE crashes with a segmentation fault again. When I run with CALCULATE_TIMES_BURNED = .TRUE., I also noticed that the flame_length, vs, and times_burned outputs do not contain any valid pixels. Would that suggest that the problem is somewhere else than the ignition mask?

lautenberger commented 8 months ago

@johnsonas6 let's move discussion of your issue to Issue #37 because it appears to be distinct from the issue that @troysalt reported.

Along those lines, @troysalt would you mind posting a link to the input deck that you're working from? If the link https://drive.google.com/file/d/18smJOk4_88SU3uX5_OjPbPTdd3bdv19k/view?usp=sharing is still current I'm no longer able to access it, could you provide access to chris@cloudfire.ai? Thanks in advance

troysalt commented 8 months ago

Hi Chris, I shared the inputs direct to you at the same link. I also sent separately a second version of an ignition mask "idg.tif", this one was produced in the same way as the adj.tif you suggested I use as a test input for the ignition mask (I extracted the array from the ignition mask raster to NumPy, then used RasterIO to write that array with the same metadata as cc.tif with the exception of float 32 data type. Still getting a segmentation fault when CALCULATE_BURN_PROBABILITIES = .TRUE.. Thanks.

lautenberger commented 8 months ago

Thank you, confirming successful receipt of both files. I will look at this tomorrow and get back to you ASAP.

lautenberger commented 8 months ago

Troy,

I was able to get this to run to completion with elmfire_2023.1015 by removing the metadata from the input rasters as we talked about previously, using idg.tif, and making the following changes in elmfire.data.in:

&OUTPUTS
DUMP_SPREAD_RATE             = .FALSE.
DUMP_FLAME_LENGTH            = .FALSE.
CONVERT_TO_GEOTIFF           = .FALSE.
CALCULATE_BURN_PROBABILITY   = .FALSE.
CALCULATE_TIMES_BURNED       = .TRUE.

The keyword CALCULATE_BURN_PROBABILITY is deprecated in favor of CALCULATE_TIMES_BURNED and will be removed in the future. The reason that you don't want to dump spread rate / flame length rasters is you'll end up with 70,000 spread rate rasters and 70,000 flame length rasters each > 20 MB in size. There is an option to generate histograms of conditional flame length that I could point you to if you're interested.

Also, this type of calculation is normally run in parallel so instead of launching as:

$ELMFIRE_INSTALL_DIR/elmfire_$ELMFIRE_VER ./inputs/elmfire.data

You should execute elmfire like this, or you won't get valid times_burned outputs:

mpirun -np 16 $ELMFIRE_INSTALL_DIR/elmfire_$ELMFIRE_VER ./inputs/elmfire.data

Two more things:

&TIME_CONTROL
SIMULATION_TSTOP = 21600.
/
troysalt commented 7 months ago

Hi Chris,

I implemented all of your suggested changes on this thread, including the metadata stripping, and am now running on ELMFIRE 2023.1015. The run completes, but I'm still not getting a valid times_burned.tif output, the file is empty. Looking at the terminal, I'm getting an error, Fortran runtime error: End of file in elmfire_io.f90 at lines 1449 and 1746, which is in the subroutines READ_BSQ_HEADER() and READ_BSQ_RASTER(). There are also messages prior to the error Problem opening bsq xml header ./scratch/ws.bsq.aux.xml. I removed the cleanup of ./scratch and see that the file was written and has 21 lines. Any ideas what may be the issue? I can email you my current input stack, if you'd like. Thank you!