terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
24 stars 13 forks source link

Run hyperspectral extractor on ongoing Season 7 data #514

Closed max-zilla closed 5 years ago

max-zilla commented 6 years ago

Get this running on incoming data for wider team usage.

max-zilla commented 6 years ago

Try to implement clipping on the files to plot level before processing to reduce memory requirements.

max-zilla commented 6 years ago

ncks -d lon,-88.8,-87.7 -d lat,39.8,40.5 out/all.nc champaign.nc

max-zilla commented 5 years ago

I will focus on trying to get _ind file Henry generated in 2017 extracted and share list of indexes in the header for submission to BETYdb. (https://github.com/terraref/computing-pipeline/issues/399)

I've tried to run 2018 data but encountered bugs and would like Vasit's team to try transferring to AZ to see if it can be run there. Bug I saw:

nco_err_exit(): ERROR NCO will now exit with system call exit(EXIT_FAILURE)
hyperspectral_workflow.sh: ERROR Failed to translate raw data. Debug this:
ncks -O  --hdr_pad=10000 --no_tmp_fl --trr_wxy=955,1600,2587 --trr typ_in=NC_USHORT --trr typ_out=NC_USHORT --trr ntl_in=bil --trr ntl_out=bsq --trr_in=/home/extractor/sites/ua-mac/raw_data/VNIR/2017-04-16/2017-04-16__11-51-16-722/0eebf84b-8a0a-45e6-aa2a-c49c82fab5e9_raw /home/extractor/hyperspectral_dummy.nc /tmp/terraref_tmp_trn.nc.pid495.fl00.tmp
max-zilla commented 5 years ago

Patric and Vasit's team are looking at this now.

max-zilla commented 5 years ago

per advice from Charlie, I refactored the Dockerfile to install netCDF + HDF5 + nco with conda:

# install conda
USER extractor
RUN cd ~ \
    &&  wget https://repo.continuum.io/archive/Anaconda2-5.3.1-Linux-x86_64.sh \
    && bash Anaconda2-5.3.1-Linux-x86_64.sh -b

#install conda-forge packages
RUN ~/anaconda2/bin/conda config --add channels conda-forge \
    && ~/anaconda2/bin/conda install hdf5 netcdf4 nco

I then mounted one of the 7.8 GB BIL files into the container on my laptop and tried the command:

~/anaconda2/bin/ncks -O --hdr_pad=10000 --no_tmp_fl --trr_wxy=955,1600,2587 --trr typ_in=NC_USHORT --trr typ_out=NC_USHORT --trr ntl_in=bil --trr ntl_out=bsq --trr_in=0eebf84b-8a0a-45e6-aa2a-c49c82fab5e9_raw ~/hyperspectral_dummy.nc terraref_foo.nc

...no errors, but it was Killed due to out of memory (don't have 32 GB ram on my laptop). So I pushed this version of image to kubernetes and will try a test on the same file on our large memory machine. The docker image is pulling now.

max-zilla commented 5 years ago

Here's the versions conda installs, btw:

hdf4-4.2.13
hdf5-1.10.4
libnetcdf-4.6.2
netcdf4-1.4.2
nco-4.7.9
max-zilla commented 5 years ago

latest progress, error on environment logger .nc file:

python /home/extractor/hyperspectral_metadata.py dbg=yes fmt=4 ftn=no /home/extractor/sites/ua-mac/raw_data/VNIR/2017-04-16/2017-04-16__11-51-16-722/0eebf84b-8a0a-45e6-aa2a-c49c82fab5e9_raw /tmp/terraref_tmp_jsn.nc.pid602.fl00.tmp

HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140449669109696:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1400 in H5F__open(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: H5Fint.c line 1615 in H5F_open(): unable to lock the file
    major: File accessibilty
    minor: Unable to open file
  #003: H5FD.c line 1640 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #004: H5FDsec2.c line 941 in H5FD_sec2_lock(): unable to lock file, errno = 37, error message = 'No locks available'
    major: File accessibilty
    minor: Bad file ID accessed
ERROR: nco__open() unable to open file "/home/extractor/sites/ua-mac/Level_1/envlog_netcdf/2017-04-16/envlog_netcdf_L1_ua-mac_2017-04-16.nc"

ERROR NC_EHDFERR Error at HDF5 layer
HINT: NC_EHDFERR errors indicate that the HDF5-backend to netCDF is unable to perform the requested task. NCO can receive this devilishly inscrutable error for a variety of possible reasons including: 1) The run-time dynamic linker attempts to resolve calls from the netCDF library to the HDF library with an HDF5 libhdf5.a that is incompatible with the version used to build NCO and netCDF. 2) An incorrect netCDF4 library implementation of a procedure (e.g., nc_rename_var()) in terms of HDF function calls (e.g., HDF5Lmove()) manifests an error or inconsistent state within the HDF5 layer. This often occurs during renaming operations (https://github.com/Unidata/netcdf-c/issues/597). 3) Bad vibes.
nco_err_exit(): ERROR Short NCO-generated message (usually name of function that triggered error): nco__open()
nco_err_exit(): ERROR Error code is -101. Translation into English with nc_strerror(-101) is "NetCDF: HDF error"
nco_err_exit(): ERROR NCO will now exit with system call exit(EXIT_FAILURE)
./hyperspectral_workflow.sh: problem extracting env-log from /home/extractor/sites/ua-mac/Level_1/envlog_netcdf/2017-04-16/envlog_netcdf_L1_ua-mac_2017-04-16.nc

Haven't dug into why this is happening yet.

max-zilla commented 5 years ago

ok, good news bad news. solved the envlog problem: export HDF5_USE_FILE_LOCKING=FALSE

This avoids the file lock bug, and I was able to successfully reprocess VNIR - 2017-04-18__10-16-26-509. The size of output files approximately match what was generated by Charlie.

However, when trying to process a dataset from 2018 (VNIR - 2018-08-18__11-47-47-403) I encounter a bug I think with the new camera:

assert len(wavelength) in (955, 272, 273), "ERROR: Failed to get wavlength information. Please check if you modified the *.hdr files"

The assertion that wavelength is one of 955, 272, 273 fails - the actual wavelength count is 939.

I modified this portion of the code in hyperspectral_metadata.py to avoid this bug:

# Check if the wavelength is correctly collected
        assert len(wavelength) in (939, 955, 272, 273), "ERROR: Failed to get wavlength information. Please check if you modified the *.hdr files (length %s)" % len(wavelength)

        camera_opt = 'VNIR' if len(wavelength) in (939, 955) else 'SWIR' # Choose appropriate camera by counting the number of wavelengths.

Note how I added 939 to valid values. Unfortunately this just yields a different downstream error:

nco_err_exit(): ERROR Error code is -57. Translation into English with nc_strerror(-57) is "NetCDF: Start+count exceeds dimension bound"
ERROR: nco_put_vara() failed to nc_put_vara() variable "cst_cnv_trg_nw"
hyperspectral_workflow.sh: problem copying theblob\n

Seems to be occurring here in hyperspectral workflow.sh:

            #copy theblob.nc to att_out
            printf "ncks -A  ${drc_spt}/theblob.nc ${att_out}"
            ncks -A  "${drc_spt}/theblob.nc" "${att_out}" 
            [ "$?" -ne 0 ] && echo "$0: problem copying theblob\n" && exit 1    

ncks -A  /home/extractor/theblob.nc /tmp/terraref_tmp_att.nc.pid15.fl00.tmp

the ncks command is very basic so not sure how to fix this dimension bound error yet.

dlebauer commented 5 years ago

Awesome ... the fact that it works on the old camera is fantastic news, since it means we actually have the ability to process data from some of the earlier seasons (how far?)! Also, can you put the output file somewhere (e.g. globus?) so I can take a look?

dlebauer commented 5 years ago

@max-zilla as we discussed last week, when you start running these, can you set the xps_img_flg so that the raw counts are included in the output files? The flag is set here: https://github.com/terraref/extractors-hyperspectral/blob/master/hyperspectral/hyperspectral_workflow.sh#L143

I'm not sure what the appropriate value is but the logic it uses is on https://github.com/terraref/extractors-hyperspectral/blob/master/hyperspectral/hyperspectral_workflow.sh#L744

I think that this will (as originally intended) make it easier to reprocess with the new calibration algorithms that @Paheding et al are developing

max-zilla commented 5 years ago

In meantime, we can queue up data from 2016-2017 that is within size constraints while we get S6+ prepared.

max-zilla commented 5 years ago

running tests on S4 data with "--output_xps_img flag enabled to include raw data

max-zilla commented 5 years ago

extractor docker image is updated and running the script to queue these for 2017 now. gave Vasit and Patrick update on Monday.

max-zilla commented 5 years ago

this issue has been open 6 months, going to create a new issue to reflect updating for newer VNIR camera with more details.

max-zilla commented 5 years ago

https://github.com/terraref/computing-pipeline/issues/576