Closed max-zilla closed 5 years ago
Try to implement clipping on the files to plot level before processing to reduce memory requirements.
ncks -d lon,-88.8,-87.7 -d lat,39.8,40.5 out/all.nc champaign.nc
I will focus on trying to get _ind file Henry generated in 2017 extracted and share list of indexes in the header for submission to BETYdb. (https://github.com/terraref/computing-pipeline/issues/399)
I've tried to run 2018 data but encountered bugs and would like Vasit's team to try transferring to AZ to see if it can be run there. Bug I saw:
nco_err_exit(): ERROR NCO will now exit with system call exit(EXIT_FAILURE)
hyperspectral_workflow.sh: ERROR Failed to translate raw data. Debug this:
ncks -O --hdr_pad=10000 --no_tmp_fl --trr_wxy=955,1600,2587 --trr typ_in=NC_USHORT --trr typ_out=NC_USHORT --trr ntl_in=bil --trr ntl_out=bsq --trr_in=/home/extractor/sites/ua-mac/raw_data/VNIR/2017-04-16/2017-04-16__11-51-16-722/0eebf84b-8a0a-45e6-aa2a-c49c82fab5e9_raw /home/extractor/hyperspectral_dummy.nc /tmp/terraref_tmp_trn.nc.pid495.fl00.tmp
Patric and Vasit's team are looking at this now.
per advice from Charlie, I refactored the Dockerfile to install netCDF + HDF5 + nco with conda:
# install conda
USER extractor
RUN cd ~ \
&& wget https://repo.continuum.io/archive/Anaconda2-5.3.1-Linux-x86_64.sh \
&& bash Anaconda2-5.3.1-Linux-x86_64.sh -b
#install conda-forge packages
RUN ~/anaconda2/bin/conda config --add channels conda-forge \
&& ~/anaconda2/bin/conda install hdf5 netcdf4 nco
I then mounted one of the 7.8 GB BIL files into the container on my laptop and tried the command:
~/anaconda2/bin/ncks -O --hdr_pad=10000 --no_tmp_fl --trr_wxy=955,1600,2587 --trr typ_in=NC_USHORT --trr typ_out=NC_USHORT --trr ntl_in=bil --trr ntl_out=bsq --trr_in=0eebf84b-8a0a-45e6-aa2a-c49c82fab5e9_raw ~/hyperspectral_dummy.nc terraref_foo.nc
...no errors, but it was Killed due to out of memory (don't have 32 GB ram on my laptop). So I pushed this version of image to kubernetes and will try a test on the same file on our large memory machine. The docker image is pulling now.
Here's the versions conda installs, btw:
hdf4-4.2.13
hdf5-1.10.4
libnetcdf-4.6.2
netcdf4-1.4.2
nco-4.7.9
latest progress, error on environment logger .nc file:
python /home/extractor/hyperspectral_metadata.py dbg=yes fmt=4 ftn=no /home/extractor/sites/ua-mac/raw_data/VNIR/2017-04-16/2017-04-16__11-51-16-722/0eebf84b-8a0a-45e6-aa2a-c49c82fab5e9_raw /tmp/terraref_tmp_jsn.nc.pid602.fl00.tmp
HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140449669109696:
#000: H5F.c line 509 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
#001: H5Fint.c line 1400 in H5F__open(): unable to open file
major: File accessibilty
minor: Unable to open file
#002: H5Fint.c line 1615 in H5F_open(): unable to lock the file
major: File accessibilty
minor: Unable to open file
#003: H5FD.c line 1640 in H5FD_lock(): driver lock request failed
major: Virtual File Layer
minor: Can't update object
#004: H5FDsec2.c line 941 in H5FD_sec2_lock(): unable to lock file, errno = 37, error message = 'No locks available'
major: File accessibilty
minor: Bad file ID accessed
ERROR: nco__open() unable to open file "/home/extractor/sites/ua-mac/Level_1/envlog_netcdf/2017-04-16/envlog_netcdf_L1_ua-mac_2017-04-16.nc"
ERROR NC_EHDFERR Error at HDF5 layer
HINT: NC_EHDFERR errors indicate that the HDF5-backend to netCDF is unable to perform the requested task. NCO can receive this devilishly inscrutable error for a variety of possible reasons including: 1) The run-time dynamic linker attempts to resolve calls from the netCDF library to the HDF library with an HDF5 libhdf5.a that is incompatible with the version used to build NCO and netCDF. 2) An incorrect netCDF4 library implementation of a procedure (e.g., nc_rename_var()) in terms of HDF function calls (e.g., HDF5Lmove()) manifests an error or inconsistent state within the HDF5 layer. This often occurs during renaming operations (https://github.com/Unidata/netcdf-c/issues/597). 3) Bad vibes.
nco_err_exit(): ERROR Short NCO-generated message (usually name of function that triggered error): nco__open()
nco_err_exit(): ERROR Error code is -101. Translation into English with nc_strerror(-101) is "NetCDF: HDF error"
nco_err_exit(): ERROR NCO will now exit with system call exit(EXIT_FAILURE)
./hyperspectral_workflow.sh: problem extracting env-log from /home/extractor/sites/ua-mac/Level_1/envlog_netcdf/2017-04-16/envlog_netcdf_L1_ua-mac_2017-04-16.nc
Haven't dug into why this is happening yet.
ok, good news bad news. solved the envlog problem:
export HDF5_USE_FILE_LOCKING=FALSE
This avoids the file lock bug, and I was able to successfully reprocess VNIR - 2017-04-18__10-16-26-509. The size of output files approximately match what was generated by Charlie.
However, when trying to process a dataset from 2018 (VNIR - 2018-08-18__11-47-47-403) I encounter a bug I think with the new camera:
assert len(wavelength) in (955, 272, 273), "ERROR: Failed to get wavlength information. Please check if you modified the *.hdr files"
The assertion that wavelength is one of 955, 272, 273 fails - the actual wavelength count is 939.
I modified this portion of the code in hyperspectral_metadata.py to avoid this bug:
# Check if the wavelength is correctly collected
assert len(wavelength) in (939, 955, 272, 273), "ERROR: Failed to get wavlength information. Please check if you modified the *.hdr files (length %s)" % len(wavelength)
camera_opt = 'VNIR' if len(wavelength) in (939, 955) else 'SWIR' # Choose appropriate camera by counting the number of wavelengths.
Note how I added 939 to valid values. Unfortunately this just yields a different downstream error:
nco_err_exit(): ERROR Error code is -57. Translation into English with nc_strerror(-57) is "NetCDF: Start+count exceeds dimension bound"
ERROR: nco_put_vara() failed to nc_put_vara() variable "cst_cnv_trg_nw"
hyperspectral_workflow.sh: problem copying theblob\n
Seems to be occurring here in hyperspectral workflow.sh:
#copy theblob.nc to att_out
printf "ncks -A ${drc_spt}/theblob.nc ${att_out}"
ncks -A "${drc_spt}/theblob.nc" "${att_out}"
[ "$?" -ne 0 ] && echo "$0: problem copying theblob\n" && exit 1
ncks -A /home/extractor/theblob.nc /tmp/terraref_tmp_att.nc.pid15.fl00.tmp
the ncks command is very basic so not sure how to fix this dimension bound error yet.
Awesome ... the fact that it works on the old camera is fantastic news, since it means we actually have the ability to process data from some of the earlier seasons (how far?)! Also, can you put the output file somewhere (e.g. globus?) so I can take a look?
@max-zilla as we discussed last week, when you start running these, can you set the xps_img_flg so that the raw counts are included in the output files? The flag is set here: https://github.com/terraref/extractors-hyperspectral/blob/master/hyperspectral/hyperspectral_workflow.sh#L143
I'm not sure what the appropriate value is but the logic it uses is on https://github.com/terraref/extractors-hyperspectral/blob/master/hyperspectral/hyperspectral_workflow.sh#L744
I think that this will (as originally intended) make it easier to reprocess with the new calibration algorithms that @Paheding et al are developing
In meantime, we can queue up data from 2016-2017 that is within size constraints while we get S6+ prepared.
running tests on S4 data with "--output_xps_img flag enabled to include raw data
extractor docker image is updated and running the script to queue these for 2017 now. gave Vasit and Patrick update on Monday.
this issue has been open 6 months, going to create a new issue to reflect updating for newer VNIR camera with more details.
Get this running on incoming data for wider team usage.