tammasloughran / ehfheatwaves

A script to calculate heatwaves from AWAP
29 stars 5 forks source link

Unexperienced python user / indexerror 0 #10

Closed anolive closed 4 years ago

anolive commented 5 years ago

Hi, I am a PhD student totally inexperienced with coding or python.

While searching for a method to calculate EHF values on a daily basis for netcdf files, I came across your module.

However, I have not been able to run it as I receive the bellow error message after writing the following line in command (Linux 18.04, python 3.6)

python3 ehfheatwaves.py -x "climpact2.sampledata.gridded.1991-2010.nc" -n "climpact2.sampledata.gridded.1991-2010.nc" -m "mask.nc" --dailyonly

Traceback (most recent call last): File "ehfheatwaves.py", line 311, in tmax = ncio.load_bp_data(options, timedata, variable='tmax', mask=mask) File "/home/ana/Python1/ehfheatwaves-master/ncio.py", line 174, in load_bp_data temp = tempnc.variables[varname][(options.bpstart<=bpdates.year)&(bpdates.year<=options.bpend)] File "netCDF4/_netCDF4.pyx", line 4328, in netCDF4._netCDF4.Variable.getitem File "/usr/local/lib/python3.6/dist-packages/netCDF4/utils.py", line 467, in _out_array_shape c = count[..., i].ravel()[0] # All elements should be identical. IndexError: index 0 is out of bounds for axis 0 with size 0

Do you have any insight on how to solve this or where to begin? I hope it is an easy answer for you, but if it is to complex to reply, thanks for your time anyway.

tammasloughran commented 5 years ago

Sure, I know what's happening.

The sample data file is 1991-2010. By default, the base period for the percentile calculations is 1961-1990. You need to specify what your base period is for the percentile threshold. For example, to use whatever data you have in you input files you should use the flag --base=1991-2010. However 10 years is not a very good sample of the distribution of temperature. I recommend a period of 30 years. If your analysis period is not long enough, you can specify a separate set of files that can be used to calculate the percentile thresholds by using some command line options.

The documentation talks about a future simulation and a historical base period because that's what I was working on at the time, but the future simulation may not necessarily be model data or in the future.

  --bpfx=FILE           Indicates a future simulation, specifying a tmax file
                        containing the historical base period to be used
  --bpfn=FILE           Indicates a future simulation, specifying a tmin file
                        containing the historical base period to be used
anolive commented 5 years ago

Hi, Thank you for your so fast reply!

I used the flag and it was just fine! I actually not interested in this 'climpact2.sampledata.gridded.1991-2010.nc', I have used it to test the module because it is usually a faster file to run.

I am analyzing all the Europe using the EOBS dataset, and I have already computed the HW aspects with R-based climpact2 (it took 12 days to compute them...), but I need daily outputs for additional research... Thanks again, I will run the module for EOBS and let you know how it goes!

anolive commented 5 years ago

Hi again, sorry to bother...

So I ran the mask script over night (europe's eobs dataset) and all went ok. I had to change variable name according to my eobs.nc file (it's 'tx' instead of 'tmax')

# Load file ncfile = nc.Dataset('climpact_europe_eobsv19.nc','r') data = ncfile.variables['tx'][:]

Now when I try to run the ehf module, I also changed the variable name in the ncio.py

`def load_bp_data(options, timedata, variable='tx', mask=None): """load_bp_data loads the tmax or tmin data for the baseperiod provided in options."""

Determine if we need tmax or tmin and whether the data are in multiple files.

if variable=='tx':
    if options.bpfx:files = options.bpfx
    else: files = options.tmaxfile
    varname = options.tmaxvname
elif variable=='tn':
    if options.bpfn: files = options.bpfn
    else: files = options.tminfile
    varname = options.tminvname`

And also on the ehfheatwaves.py # Load the temperature data over the base period if options.keeptave or options.keeptmax: tmax = ncio.load_bp_data(options, timedata, variable='tx', mask=mask) if options.keeptave or options.keeptmin: tmin = ncio.load_bp_data(options, timedata, variable='tn', mask=mask) if options.keeptave: tave_base = (tmax + tmin)/2.

Still, when I run the code, this is what I get

python3 ehfheatwaves.py -x "climpact_europe_eobsv19.nc" -n "climpact_europe_eobsv19.nc" -m "mask.nc" --dailyonly Traceback (most recent call last): File "ehfheatwaves.py", line 311, in <module> tmax = ncio.load_bp_data(options, timedata, variable='tx', mask=mask) File "/home/ana/Python1/ehfheatwaves-master/ncio.py", line 174, in load_bp_data temp = tempnc.variables[varname][(options.bpstart<=bpdates.year)&(bpdates.year<=options.bpend)] KeyError: 'tmax' I am not sure if I made all the necessary changes or too few, or incorrect....

The variable names, as read in Panoply are >

short tx(time=25202, latitude=465, longitude=705);
  :standard_name = "air_temperature";
  :long_name = "maximum temperature";

short tn(time=25202, latitude=465, longitude=705);
  :standard_name = "air_temperature";
  :long_name = "minimum temperature";

Another issue is the units, EOBS is in Celsius instead of Kelvin.

Thanks again for your time. Ana

tammasloughran commented 5 years ago
  1. There shouldn't be a need to change the hard coded default variable names. Variable names can already be specified with command line options.

    --vnamex=STR          tmax variable name
    --vnamen=STR          tmin variable name
    --vnamem=STR          mask variable name
    --vnamet=STR          time variable name
  2. Because the heatwave indices are all relative measures, the units of the input data shouldn't matter. The output would simply have the same units as the input. You just need to be aware of that when reading output metadata.

anolive commented 5 years ago

You're right, it's running. I will see how it goes and let you know

anolive commented 5 years ago

Hi again. So the script was running for the past five days, and finished today with the following message. I am not able to open the resulting netCDF file, although it is more than 40GB in size! I will try additional software alternatives to open it an inspect the results. What do you think of the message?

ana@ana:~/Python1/ehfheatwaves-master$ python3 ehfheatwaves.py -x "climpact_europe_eobsv19.nc" -n "climpact_europe_eobsv19.nc" -m "mask.nc" --vnamex=tx --vnamen=tn --vnamet=time --dailyonly /home/ana/.local/lib/python3.6/site-packages/pandas/compat/_optional.py:106: UserWarning: Pandas requires version '1.2.1' or newer of 'bottleneck' (version '1.2.0' currently installed). warnings.warn(msg, UserWarning) Traceback (most recent call last): File "ehfheatwaves.py", line 439, in ncio.save_daily(EHF, event, ends, options, timedata, original_shape, mask, defn='EHF') File "/home/ana/Python1/ehfheatwaves-master/ncio.py", line 499, in save_daily oehf[:] = dummy_array.copy() File "netCDF4/_netCDF4.pyx", line 4893, in netCDF4._netCDF4.Variable.setitem File "netCDF4/_netCDF4.pyx", line 5173, in netCDF4._netCDF4.Variable._put File "netCDF4/_netCDF4.pyx", line 1857, in netCDF4._netCDF4._ensure_nc_success RuntimeError: NetCDF: HDF error Segmentation fault (core dumped)

tammasloughran commented 5 years ago

It's trying to write data to file but failing for some reason. Without more information I can't say for sure. How large is your input file? Can you check that you have enough space to create large files on your hard disk? The daily output file will have 3 variables of the same size as the input file, so it will be larger than the input file by at least a third. You could be running out of virtual/swap memory. How much virtual/swap memory do you have? You could try splitting your input file into smaller size files and running ehfheatwaves.py on each of those (with the same base period). Be aware the the first 28 days of each output file would be missing.

The problem could also be a non-standard structure of the input data. Can you please show me the header of the input file? (ncdump -h ).

tammasloughran commented 5 years ago

Also the mask creation script I provided is only intended as an example of how to create a mask. I dont know enough about your data to say whether or not it works. I would double check that it is actually creating a land sea mask at all, rather than simply 1s everywhere.

anolive commented 5 years ago

Hi, thanks again for your help.

So, I checked the available space and indeed that might have been the problem since it is not much larger than the 50GB's of the input file. I am solving this issue to assure 150GB's of free space.

Anyway these are my free -m>

Mem: 15971 1741 12684 119 1545 13792 Swap: 537871 0 537871

The second issue was the format, so this is the ncdump output> ncdump -h climpact_europe_eobsv19.nc netcdf climpact_europe_eobsv19 { dimensions: time = UNLIMITED ; // (25202 currently) longitude = 705 ; latitude = 465 ; variables: double time(time) ; time:standard_name = "time" ; time:long_name = "Time in days" ; time:units = "days since 1950-01-01 00:00" ; time:calendar = "standard" ; time:axis = "T" ; double longitude(longitude) ; longitude:standard_name = "longitude" ; longitude:long_name = "Longitude values" ; longitude:units = "degrees_east" ; longitude:axis = "X" ; double latitude(latitude) ; latitude:standard_name = "latitude" ; latitude:long_name = "Latitude values" ; latitude:units = "degrees_north" ; latitude:axis = "Y" ; short tx(time, latitude, longitude) ; tx:standard_name = "air_temperature" ; tx:long_name = "maximum temperature" ; tx:units = "Celsius" ; tx:add_offset = 0.f ; tx:scale_factor = 0.01f ; tx:_FillValue = -9999s ; tx:missing_value = -9999s ; short tn(time, latitude, longitude) ; tn:standard_name = "air_temperature" ; tn:long_name = "minimum temperature" ; tn:units = "Celsius" ; tn:add_offset = 0.f ; tn:scale_factor = 0.01f ; tn:_FillValue = -9999s ; tn:missing_value = -9999s ; short rr(time, latitude, longitude) ; rr:standard_name = "thickness_of_rainfall_amount" ; rr:long_name = "rainfall" ; rr:units = "kg m-2 d-1" ; rr:add_offset = 0.f ; rr:scale_factor = 0.1f ; rr:_FillValue = -9999s ; rr:missing_value = -9999s ;

// global attributes: :CDI = "Climate Data Interface version 1.9.6 (http://mpimet.mpg.de/cdi)" ; :history = "Tue May 07 17:33:12 2019: cdo merge tx_ens_mean_0.1deg_reg_v19.0e.nc tn_ens_mean_0.1deg_reg_v19.0e.nc rr_kg_grid.nc clim_europe_eobsv19.nc\nMon Feb 18 12:08:50 2019: ncks -O -d time,0,25201 /data4/Else/EOBSv19.0e/Grid_0.1deg/tx/tx_ensmean_master_untilJan2019.nc /data4/Else/EOBSv19.0e/Grid_0.1deg/tx/tx_ensmean_master.nc" ; :Conventions = "CF-1.4" ; :E-OBS_version = "19.0e" ; :References = "http://surfobs.climate.copernicus.eu/dataaccess/access_eobs.php" ; :NCO = "netCDF Operators version 4.7.5 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ; :CDO = "Climate Data Operators version 1.9.6 (http://mpimet.mpg.de/cdo)" ; }

Look's ok, right? Finally the mask, I have checked it in panoply, before running the ehfheatwave's script, and it was perfectly masking Europe's land surface versus the sea, so I don't think there is a problem there.

I also updated the bottleneck version to avoid that first error.

So I am cleaning up memory space and will re-running the script for the whole Europe. Hopefully I will have good news in 5 days...

Thanks, Ana