Execute hyperspectral extraction on 2016 data

max-zilla commented 7 years ago

Following discussions in #195....

successfully performed VNIR test extraction with current hyperspectral code on Roger
generated list of dataset IDs for all VNIR datasets to execute
camera-type check is being adjusted for identification of SWIR camera

As soon as Roger maintenance completes this morning, I will start processing.

@czender @FlyingWithJerome However, based on a few spot checks I anticipate some of the VNIR datasets will fail (safely). It seems the metadata formats have changed over time and we may need to expand the fields that are checked in the script.

https://terraref.ncsa.illinois.edu/clowder/datasets/58702ace4f0c0dbad1a81307 (Clowder will be offline until Roger maintenance is complete) For example, this VNIR dataset from 12/05 has the following metadata:

 lemnatec_measurement_metadata:
   gantry_system_fixed_metadata:
    project responsible: ben.niehaus@lemnatec.de
    date of installation: april 2016
    LemnaTecs field scanalzyer no: 2
    Trivials: biggest agriculture robot in the world
    system scan area height [m]: 8
    system scan area e-w [m]: 25
    funded by: Arpa-E
    system scan area n-s [m]: 220
    date of handover: todo
    system manufacturer: LemnaTec Corp., 4240 Duncan Ave, Saint Louis, MO 63110
    LemnaTec ProjNo: 7100019
  sensor_fixed_metadata:
    sensor serial number: G4-384
    sensor purpose: measures spectral reflectance from 380nm to 1000nm
    sensor product name: VNIR
    sensor description: hyperspectral camera to measure visible near infrared (VNIR) radiation
    sensor manufacturer: Headwall Scientific
  gantry_system_variable_metadata:
    timestamp: 12/05/2016 16:30:29
    only small set of meta data available: measurement was done during sensor setup
    PLC control not available: no gantry position data
  user_given_metadata:
    date of emergence: 2016-08-08
    date of sowing: 2016-08-03
    project: TERRA-REF
    nstrument: gantry at Maricopa phenotyping facility
    experiment responsible: to be named
    location: Maricopa phenotyping facility
    experiment title: Sorghum field experiment 2
  sensor_variable_metadata:
    current setting exposure: 45
    current setting startpos: -70
    current setting frameperiod: 50
    current setting userotatingmirror: 0
    current setting speed: 100
    current setting useexternaltrigger: 0
    current setting constmirrorpos: 0
    current setting createdatacube: 0
    current setting stoppos: 70

When I submitted this file for extraction, received this error:

Traceback (most recent call last):
  File "/gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_metadata.py", line 688, in <module>
    main()
  File "/gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_metadata.py", line 684, in main
    testCase.writeToNetCDF(file_input, file_output, " ".join((file_input, file_output)), format, flatten, debug)
  File "/gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_metadata.py", line 189, in writeToNetCDF
    tempFrameTime = frame_index_parser(''.join((inputFilePath.strip("raw"), "frameIndex.txt")), yearMonthDate)
UnboundLocalError: local variable 'yearMonthDate' referenced before assignment

...and the problematic variable in question is defined in the python file as:

if "time" in data:
                        yearMonthDate = data["time"]
                    elif "Time" in data:
                        yearMonthDate = data["Time"]

In the metadata above, I suspect the yearMonthDate we'd want is now:

  gantry_system_variable_metadata:
    timestamp: 12/05/2016 16:30:29

Ideally this hasn't changed, but if I understand correctly we'd need to check for a "timestamp" key in data as well (?). The best broad approach at this point is probably for us to just run all the datasets, let those that fail fail, and check our error messages to determine what needs to be adjusted later. Hopefully the majority will go through with no issues - I think we all expect that the earliest 2016 data could have rough edges, but this would help us identify issues with subsequent data as well. @robkooper this would be a valuable application of the new RabbitMQ error queues to preserve failed messages as well.

Maybe I'm misinterpreting things but that's my sense from the metadata+error.

dlebauer commented 7 years ago

Did the time stamp change from the YYYY-MM-DD HH:MM:SS specification at some point @smarshall-bmr?

@max-zilla and @craig-willis Should we fix the metadata instead of the code? I think the code should follow the standard. But efficiency of implementation and interoperability may favor generality.

FlyingWithJerome commented 7 years ago

@max-zilla I think your interpretation is correct. The first few batches of metadata (probably before June) used "time" as the keyword to identify the time of gantry, and then it had been changed to "Time." Now we need to add one more keyword "timestamp."

@dlebauer So far we have three regular expression patterns to match different time patterns we had observed in metadata (and we need to add one more this time). I think we do need to have a standard to name those values since we have already been hard-coding. Other examples include "Gantry Speed" and "Gantry Velocity," "Gantry Position" and "Gantry position" in metadata.

czender commented 7 years ago

@max-zilla @dlebauer @FlyingWithJerome It is not a problem to add "timestamp" to the list of synonyms searched for the year/month/date. By now it is no surprise that upstream folks have or will change metadata names without testing their changes on the existing hyperspectral workflow. Sophisticated workflows like this always grow a bit crufty as names are tweaked to better reflect their intent. The best we can do is to mitigate the surprise factor by making the extractor operational so problems are noticed and fixed immediately.

craig-willis commented 7 years ago

Another option might be to have a process that runs to correct/standardize the metadata. This way we can centralize these types of transformations instead of requiring each extractor to handle them locally. We might consider something to detect these changes as well, if we can't rely on folks upstream to notify us.

max-zilla commented 7 years ago

@craig-willis we do have other metadata extractors here: https://github.com/terraref/extractors-metadata

So far we have one that pulls geo data from metadata and another that extracts metadata from netCDF files, but I could see another extractor be "Metadata Cleaner" that has all our rules baked in to bring incoming metadata into some standardized format.

The tricky question, to me, is how we apply that transformation. I'm very wary of editing the JSON files themselves in the raw_data and would prefer to create a new 'clean' JSON file (so that we always have the original if necessary) but that's a lot of new files we're talking about. We could add the updated metadata to Clowder but that doesn't help anyone coming in via the Globus pipeline.

I guess I'm less wary of ADDITIVE things to the JSON files, i.e. only adding fields to some standard_fields sub-json object while leaving the originals wherever they are in the hierarchy, and having our extractors look for standard_fields as first choice. Not crazy about that either but just throwing out some ideas.

dlebauer commented 7 years ago

I want to re-emphasize here that we should be focusing on Season 2. The metadata should have been standardized by https://github.com/terraref/reference-data/issues/25 (closed July 2016) and there should not have been any changes (other than adding fields) since the beginning of Season 2 (Aug 2016).

I would rather not worry about having an extra meta-data transformation layer. It sounds like a lot of work that could be avoided if a) gantry operator doesn't change the metadata and b) we write a test to check that all of the fields exist, and perhaps are populated. We may not be able to check every metadata file but if we checked even one per sensor per day it would likely catch any violations of (a) .

Are there any examples of changes to metadata fields since Aug 2016 (beginning of Season 2)? @smarshall-bmr ... are you aware of any tampering with our metadata generator?

If we still need a metadata fixer layer, it should only be done if steps a and b (not changing metadata and testing that metadata is not changed) are insufficient. If we need to fix the Season 0 and Season 1 metadata, we can do that later as a one-off.

A test that the metadata is in the correct format could be as simple as making sure that all of the fields that we currently have exist (a logical grep). This could be added to the overall testing standards compliance issue https://github.com/terraref/computing-pipeline/issues/232, even though that issue is for the 'public-facing' metadata standards, this is a good use-case.

dlebauer commented 7 years ago

PS here is the test for FLIR metadata: https://gist.github.com/dlebauer/52f934a28d5d8185ba1bd89644155c46

If this my recommendation from above is sensible and worth implementing, I can create the same test for the rest of the metadata files.

max-zilla commented 7 years ago

@czender how long roughly does one run take on a 64 GB input file? I am troubleshooting the bulk run this morning and trying to run test on this dataset from 05/01: https://terraref.ncsa.illinois.edu/clowder/datasets/58702ad54f0c0dbad1a81378

...64.8 GB raw file.

The output doesn't seem to have errors:

Terraref hyperspectral data workflow invoked with:
hyperspectral_workflow.sh -d 1 -i /projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw -o /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_.nc
Hyperspectral workflow scripts in directory /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral
NCO version "4.6.2-beta03" from directory /gpfs/smallblockFS/sw/nco-4.6.2-beta03/bin
Intermediate/temporary files written to directory /gpfs_scratch/arpae/imaging_spectrometer
Final output stored in directory /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral/2016-05-01/2016-05-01__09-33-08-075
Input #00: /projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw
trn(in)  : /projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw
trn(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_trn.nc.pid100194.fl00.tmp
ncks -O --hdr_pad=10000 --no_tmp_fl --trr_wxy=955,1600,21219 --trr typ_in=NC_USHORT --trr typ_out=NC_USHORT --trr ntl_in=bil --trr ntl_out=bsq --trr_in=/projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_dummy.nc /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_trn.nc.pid100194.fl00.tmp
att(in)  : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_trn.nc.pid100194.fl00.tmp
att(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
ncatted -O --gaa terraref_script=hyperspectral_workflow.sh --gaa terraref_hostname=cg-cmp25 --gaa terraref_version="4.6.2-beta03" -a "Conventions,global,o,c,CF-1.5" -a "Project,global,o,c,TERRAREF" --gaa history="Tue Jan 17 09:18:46 CST 2017: hyperspectral_workflow.sh -d 1 -i /projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw -o /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_.nc" /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_trn.nc.pid100194.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
jsn(in)  : /projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw
jsn(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid100194
python /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_metadata.py dbg=yes fmt=4 ftn=no /projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid100194.fl00.tmp
mrg(in)  : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid100194.fl00.tmp
mrg(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
ncks -A /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid100194.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
mrg(in)  : /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/calibration_vnir_45ms.nc
mrg(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
ncks -A -C -v xps_img_wht,xps_img_drk /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/calibration_vnir_45ms.nc /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
clb(in)  : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
clb(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_clb.nc.pid100194.fl00.tmp
ncap2 -A --hdr_pad=10000 -s @drc_spt='"/gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral"' -S /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_calibration.nco /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
Setting parser(filename)=/gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_calibration.nco

...but I'm trying to determine if I'm seeing a timeout issue in the pipeline workflow, or if I'm simply not waiting long enough (this ran for ~30 minutes before I terminated it). How long did you see for a successful 64 GB run?

czender commented 7 years ago

Last time I checked it took 94 minutes, so let it run...

max-zilla commented 7 years ago

@jdmaloney mentioned this in terra room, but when moving from my sites/ua-mac/Level_1/hyperspectral_manualcheck directory to the proper /hyperspectral one, I no longer have write permissions for the extractor outputs. Can you add group write permissions to all the /Level_1 directories so my extractors can write to them? Thanks.

dlebauer commented 7 years ago

Didn't we decide the extractors should have write access, but not individuals? On Thu, Jan 19, 2017 at 1:14 PM Max Burnette notifications@github.com wrote:

@jdmaloney https://github.com/jdmaloney mentioned this in terra room, but when moving from my sites/ua-mac/Level_1/hyperspectral_manualcheck directory to the proper /hyperspectral one, I no longer have write permissions for the extractor outputs. Can you add group write permissions to all the /Level_1 directories so my extractors can write to them? Thanks.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/230#issuecomment-273885875, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX5waulY5vtEF7wM5FDnhE_WT0uTxOks5rT8QxgaJpZM4Lh1N- .

max-zilla commented 7 years ago

That seems to be how it's configured, but if so I'll still need to work out a solution for hyperspectral, since that isn't on a VM with mounted permissions like the others are - instead, it's running a job on Roger node and since I'm the one starting it, I'm not allowed. If you (dlebauer user) were to start it i think that would be fine since you own the directories,, but that isn't a good long-term solution.

dlebauer commented 7 years ago

If you send me a command line statement I can run it. Do you mean "chmod -R +w groupname sites/ua-mac/Level_1/'? (Easiest if you just tell me what to run)

max-zilla commented 7 years ago

5 of these are running now. 1,300 datasets queued. Will monitor progress next day or two.

max-zilla commented 7 years ago

This has been progressing. A lot of outputs in April/May and it's starting on early September now.

539 datasets processed.

max-zilla commented 7 years ago

@czender @solmazhajmohammadi The location of the hyperspectral outputs:

On Roger: /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral

Via Globus: ua-mac/Level_1/hyperspectral

craig-willis commented 7 years ago

Returning to the earlier discussion about "time" v "Time" v "timestamp"(https://github.com/terraref/computing-pipeline/issues/230#issue-200385894). It looks like the "time" (or "Time") entry is there when there is position information and "timestamp" is there when "PLC control not available". The last entry with "Time" (capital T) is in May, and from that point forward it's "time". I'm wondering if the change to "timestamp" has to do with alternate code path, not a change to the metadata.

TinoDornbusch commented 7 years ago

@craig-willis Do you need any correction of time / timestamp / Time in the source Code?

dlebauer commented 7 years ago

Or maybe @czender can answer.

Most important is that the names and formats don't change unless necessary, and if necessary, with a enough warning (a month) so downstream fixes can be made and tested.

craig-willis commented 7 years ago

@TinoDornbusch Could you provide an explanation of why these values changed?

I think the "Time" vs "time" change is older -- maybe an earlier correction to the metadata to use lowercase (the last case of "Time" I see is from 5/20/2016).

However, "time" versus "timestamp" seems to still be a problem. Why are there two different labels for the same value? For example:

VNIR/2016-12-06/2016-12-06__13-56-35-387/8f3b7fe9-7f04-44db-b52f-0c64e3cb1ee1_metadata.json "timestamp": "12/06/2016 13:56:35"

VNIR//2016-12-08/2016-12-08__15-30-44-025/a86e48ab-4954-4a11-bda6-1f9f7abbae9f_metadata.json "time": "12/08/2016 15:30:44"

FlyingWithJerome commented 7 years ago

@craig-willis @TinoDornbusch @dlebauer By now "time" / "timestamp" / "Time" are all keywords and will be guaranteed to be captured, so a little bit disorder in the metadata won't affect the downstream too much so far. Probably using three keywords bothers upstream people to manage the metadata. But hopefully we will not have more keywords.

czender commented 7 years ago

@max-zilla How many wallclock days did it take to process the 2016 data? We may want to re-process after we add some new features to the workflow, and correct some metadata issues. Also, do you know why the output filenames all end with an underscore? I've just added the workflow script invocation as global metadata so we can see if there are clues in that.

dlebauer commented 7 years ago

@FlyingWithJerome can you make sure the metadata generated by the extractor downstream is consistent (e.g. wether time, timestamp or Time is passed, just make it 'time')

If you are working on the metadata now I have a few other suggestions for changes to the metadata, but I will address these in a separate issue. (my assumption is that we can update metadata later without re-running the entire pipeline ... is that correct?).

czender commented 7 years ago

@FlyingWithJerome the output file does not contain a "time" variable. Instead it contains "frametime". However, the attributes say "The datestamp for each frame is stored in the *_frameindex.txt file which is archived in the time variable". This is inconsistent. Perhaps I wasn't clear but what I thought I asked is:

Rename frametime as time
Create a new variable called frametime which is NC_STRING and contains the datestamp as a string.

dlebauer commented 7 years ago

@czender and @FlyingWithJerome did you ever export nc metadata to json in the extractor? If so, shouldn't it show up here: https://terraref.ncsa.illinois.edu/clowder/files/587e8efe4f0cd67174dd1dcb?dataset=58702ad54f0c0dbad1a81378&space=57e42cd44f0cff4b58dd3eea

czender commented 7 years ago

We never added a separate step and created a separate JSON file. We (@hmb1) added the capability of NCO to do so, however. Would you like a separate file (i.e., output_metadata.json), or a new attribute in the existing file that contains the JSON dump of the metadata?

dlebauer commented 7 years ago

@czender my understanding is that there is metadata at the dataset level and the file level. It would make sense to add this to the file-level metadata, since it applies to the output .nc files but not the input files. so you could create both the output_metadata.json and insert it into the Clowder metadata database (@max-zilla can suggest how to do this)

max-zilla commented 7 years ago

@dlebauer the link you posted was run through the netCDF extractor:

terra.netcdf    Thu Jan 19 08:14:20 CST 2017    N/A START: Started processing
terra.netcdf    Thu Jan 19 08:14:20 CST 2017    N/A PROCESSING: Downloading file.

...which extracts some metadata. However, even with the Roger filesystem mounted, it appears our VM with 4GB ram chokes trying to handle the 195GB file (wasn't sure if this would be the case depending on how the header is read). I will talk with @robkooper about possible workarounds for large .nc files, but in a pinch we could run the extractor as a Roger job the way we are the hyperspectral workflow (and eventually, Solmaz' sensor fusion).

@czender the hyperspectral extraction just finished the first run overnight, it appears - I have 1,119 .nc files in the Level_1 directory but since some of the recent fixes weren't deployed the leftover datasets were probably missing a field or something. So call it a week roughly with 5 extractors running and the odd ~8 hours of downtime here and there if the Roger job expired at 1am or something.

David is correct that we can very easily add metadata to the .nc file itself - I think his approach is good:

create nc_metadata.json file in the Level_1 directory where the nc file is generated
post JSON contents of that file to the Clowder file itself (the file will already be added to the originating dataset). So @czender a separate json file fits the pattern we're using for the raw data structures.

e: I'm still catching up with emails from yesterday afternoon and I see a pull request has come and gone in the meantime :)

czender commented 7 years ago

Now that there's a separate JSON file with the metadata, I'm not sure how else to help with this because I don't know how to post it to Clowder. BTW, the command to dump the metadata in JSON format is ncks -m -M --json in.nc. It should only require trivial amounts of RAM. If this is not your experience, let me know.

max-zilla commented 7 years ago

@czender just want to clarify:

hyperspectral extractor generates .nc file.
we have a netcdf extractor that will extract .nc netadata into JSON/XML/CDF formats. it places outputs in same directory as .nc file

Here is the salient portion of the JSON code:

logging.info('...extracting metadata in json format: %s' % metaFilePath)
            with open(metaFilePath, 'w') as fmeta:
                subprocess.call(['ncks', '--jsn', '-m', '-M', inPath], stdout=fmeta)
            if os.path.exists(metaFilePath):
                pyclowder.files.upload_to_dataset(connector, host, secret_key, resource['parent']['id'], metaFilePath)

...i think only modification is to update netCDF extractor so that the JSON file here is opened and parsed and added to Clowder as metadata, in addition to what it already does (add the JSON file itself to Clowder).

Does that sound correct?

czender commented 7 years ago

@dlebauer above asked me to create a JSON file so I added this creation to the HS workflow. @max-zilla says that step was already implemented (though not working at the time) by the "netCDF extractor" so now there are two JSON files. I will turn-off the JSON generation in the HS workflow. @max-zilla is the expert at putting things into clowder, so I suggest he do it. Too many cooks :)

max-zilla commented 7 years ago

@czender @dlebauer I updated the netCDF metadata extractor so that, in addition to CDL/XML/JSON files that are created and uploaded, the JSON file contents are added to .NC file as metadata.

max-zilla commented 7 years ago

@czender are there sufficient updates to the workflow that we should re-run extractor on the 2016 data, or are more updates incoming?

czender commented 7 years ago

There has been some worthwhile changes and more are coming. I recommend re-running 2016 in two weeks. This should give time to get the latitude/longitude coordinates in, and some more _FillValues.

max-zilla commented 7 years ago

@czender I ran the newest code on 04/15 of 2017. Outputs are in:

/sites/ua-mac/Level_1/hyperspectral/2017-04-15/

The VNIR ran ok from the look of things, but SWIR failed. However it looks like there might be some broken links in Clowder SWIR datasets that could be responsible - going to diagnose today and try some re-runs.

SWIR will go in /Level_1/hyperspectral_swir directory.

max-zilla commented 7 years ago

@jdmaloney i traced this problem back to the bulk rebuild script we ran to create the SWIR datasets:

mburnet2@terra-clowder:/home/clowder/bulk_rebuild$ ls /home/clowder/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/
15de159e-4446-46ad-b2d9-c0d0ad8563d8_frameIndex.txt  15de159e-4446-46ad-b2d9-c0d0ad8563d8_metadata.json  15de159e-4446-46ad-b2d9-c0d0ad8563d8_raw.hdr
15de159e-4446-46ad-b2d9-c0d0ad8563d8_image.jpg       15de159e-4446-46ad-b2d9-c0d0ad8563d8_raw            15de159e-4446-46ad-b2d9-c0d0ad8563d8_settings.txt

mburnet2@terra-clowder:/home/clowder/bulk_rebuild$ tail lists/raw_data/uamac_SWIR.list 
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__14-14-19-938/66359f20-8839-4158-a083-48dded7f41e5_2016_12_08_14_17_35image.jpg
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__14-14-19-938/66359f20-8839-4158-a083-48dded7f41e5_metadata.json
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__14-14-19-938/66359f20-8839-4158-a083-48dded7f41e5_2016_12_08_14_17_35settings.txt
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__14-14-19-938/66359f20-8839-4158-a083-48dded7f41e5_2016_12_08_14_17_35raw.hdr
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/15de159e-4446-46ad-b2d9-c0d0ad8563d8_2016_12_08_15_11_49settings.txt
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/15de159e-4446-46ad-b2d9-c0d0ad8563d8_2016_12_08_15_11_49raw.hdr
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/15de159e-4446-46ad-b2d9-c0d0ad8563d8_2016_12_08_15_11_49frameIndex.txt
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/15de159e-4446-46ad-b2d9-c0d0ad8563d8_metadata.json
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/15de159e-4446-46ad-b2d9-c0d0ad8563d8_2016_12_08_15_11_49raw
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/15de159e-4446-46ad-b2d9-c0d0ad8563d8_2016_12_08_15_11_49image.jpg

Simpler example of one file:

FILENAME:
15de159e-4446-46ad-b2d9-c0d0ad8563d8_image.jpg

ENTRY IN LISTS FILE:
15de159e-4446-46ad-b2d9-c0d0ad8563d8_2016_12_08_15_11_49image.jpg

It's like the timestamp from the next dataset got appended onto the filenames of the previous somehow.

I'm a bit confused though, because the list I'm pasting from didn't include up to 04/15, but datasets on 04/15 have this problem: https://terraref.ncsa.illinois.edu/clowder/datasets/58f2cec64f0c5bee63a4d655 (the Clowder points to the wrong path with the appended timestamp)

...But I couldn't find any place in my pipeline code that was obviously introducing this, and the data from yesterday which went through the pipeline is OK: https://terraref.ncsa.illinois.edu/clowder/datasets/5907cdfc4f0c20f1bf08b589

...I'm thinking perhaps we just purge the relatively small number of SWIR datasets from Clowder and recreate with a corrected lists file as above.

czender commented 7 years ago

@max-zilla and @jdmaloney is this the same issue as recently addressed/solved in #281 ? @solmazhajmohammadi says the extra date was introduced by "headwall software" and is now fixed.

max-zilla commented 7 years ago

@czender yes that's exactly what it is. I had not seen @dlebauer rename script on the SWIR data, but the Clowder upload happened when that data landed and thus still has pointers to the old filenames so it couldn't find them.

thanks for pointing that out, I'll work with @robkooper and @jdmaloney to get the corrected filenames in Clowder database and run it - as you saw about looks like the SWIR at least processes correctly.

max-zilla commented 7 years ago

I tried running the field stitching script on 04/15, but the VRT creation process failed:

gdalbuildvrt -srcnodata "-99 -99 -99" -overwrite 
-input_file_list /home/extractor/sites/ua-mac/Level_1/fullfield/2017-04-15/hyperspectral_fileList.txt 
/home/extractor/sites/ua-mac/Level_1/fullfield/2017-04-15/hyperspectral_fullfield.VRT
0...10...20...30.
Warning 1: Unsupported netCDF datatype (8), treat as Float32.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
Warning 1: Unsupported netCDF datatype (8), treat as Float32.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
Warning 1: Unsupported netCDF datatype (8), treat as Float32.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
Warning 1: Unsupported netCDF datatype (8), treat as Float32.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
...
More than 1000 errors or warnings have been reported. No more will be reported from now.

This is the result on the .nc files.

Trying to sort out the SWIR query fix today.

dlebauer commented 7 years ago

@max-zilla did you try gdal_translate nc-->tif?

max-zilla commented 7 years ago

http://www.gdal.org/frmt_various.html

GMT -- GMT Compatible netCDF

GDAL has limited support for reading and writing netCDF grid files. NetCDF files that are not recognised as grids (they lack variables called dimension, and z) will be silently ignored by this driver. This driver is primarily intended to provide a mechanism for grid interchange with the GMT package. The netCDF driver should be used for more general netCDF datasets.

The units information in the file will be ignored, but x_range, and y_range information will be read to get georeferenced extents of the raster. All netCDF data types should be supported for reading.

Newly created files (with a type of GMT) will always have units of "meters" for x, y and z but the x_range, y_range and z_range should be correct. Note that netCDF does not have an unsigned byte data type, so 8bit rasters will generally need to be converted to Int16 for export to GMT.

NetCDF support in GDAL is optional, and not compiled in by default.

NOTE: Implemented as gdal/frmts/netcdf/gmtdataset.cpp.

terraref / computing-pipeline

Execute hyperspectral extraction on 2016 data #230