Closed max-zilla closed 7 years ago
Did the time stamp change from the YYYY-MM-DD HH:MM:SS specification at some point @smarshall-bmr?
@max-zilla and @craig-willis Should we fix the metadata instead of the code? I think the code should follow the standard. But efficiency of implementation and interoperability may favor generality.
@max-zilla I think your interpretation is correct. The first few batches of metadata (probably before June) used "time" as the keyword to identify the time of gantry, and then it had been changed to "Time." Now we need to add one more keyword "timestamp."
@dlebauer So far we have three regular expression patterns to match different time patterns we had observed in metadata (and we need to add one more this time). I think we do need to have a standard to name those values since we have already been hard-coding. Other examples include "Gantry Speed" and "Gantry Velocity," "Gantry Position" and "Gantry position" in metadata.
@max-zilla @dlebauer @FlyingWithJerome It is not a problem to add "timestamp" to the list of synonyms searched for the year/month/date. By now it is no surprise that upstream folks have or will change metadata names without testing their changes on the existing hyperspectral workflow. Sophisticated workflows like this always grow a bit crufty as names are tweaked to better reflect their intent. The best we can do is to mitigate the surprise factor by making the extractor operational so problems are noticed and fixed immediately.
Another option might be to have a process that runs to correct/standardize the metadata. This way we can centralize these types of transformations instead of requiring each extractor to handle them locally. We might consider something to detect these changes as well, if we can't rely on folks upstream to notify us.
@craig-willis we do have other metadata extractors here: https://github.com/terraref/extractors-metadata
So far we have one that pulls geo data from metadata and another that extracts metadata from netCDF files, but I could see another extractor be "Metadata Cleaner" that has all our rules baked in to bring incoming metadata into some standardized format.
The tricky question, to me, is how we apply that transformation. I'm very wary of editing the JSON files themselves in the raw_data and would prefer to create a new 'clean' JSON file (so that we always have the original if necessary) but that's a lot of new files we're talking about. We could add the updated metadata to Clowder but that doesn't help anyone coming in via the Globus pipeline.
I guess I'm less wary of ADDITIVE things to the JSON files, i.e. only adding fields to some standard_fields sub-json object while leaving the originals wherever they are in the hierarchy, and having our extractors look for standard_fields as first choice. Not crazy about that either but just throwing out some ideas.
I want to re-emphasize here that we should be focusing on Season 2. The metadata should have been standardized by https://github.com/terraref/reference-data/issues/25 (closed July 2016) and there should not have been any changes (other than adding fields) since the beginning of Season 2 (Aug 2016).
I would rather not worry about having an extra meta-data transformation layer. It sounds like a lot of work that could be avoided if a) gantry operator doesn't change the metadata and b) we write a test to check that all of the fields exist, and perhaps are populated. We may not be able to check every metadata file but if we checked even one per sensor per day it would likely catch any violations of (a) .
Are there any examples of changes to metadata fields since Aug 2016 (beginning of Season 2)? @smarshall-bmr ... are you aware of any tampering with our metadata generator?
If we still need a metadata fixer layer, it should only be done if steps a and b (not changing metadata and testing that metadata is not changed) are insufficient. If we need to fix the Season 0 and Season 1 metadata, we can do that later as a one-off.
A test that the metadata is in the correct format could be as simple as making sure that all of the fields that we currently have exist (a logical grep). This could be added to the overall testing standards compliance issue https://github.com/terraref/computing-pipeline/issues/232, even though that issue is for the 'public-facing' metadata standards, this is a good use-case.
PS here is the test for FLIR metadata: https://gist.github.com/dlebauer/52f934a28d5d8185ba1bd89644155c46
If this my recommendation from above is sensible and worth implementing, I can create the same test for the rest of the metadata files.
@czender how long roughly does one run take on a 64 GB input file? I am troubleshooting the bulk run this morning and trying to run test on this dataset from 05/01: https://terraref.ncsa.illinois.edu/clowder/datasets/58702ad54f0c0dbad1a81378
...64.8 GB raw file.
The output doesn't seem to have errors:
Terraref hyperspectral data workflow invoked with:
hyperspectral_workflow.sh -d 1 -i /projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw -o /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_.nc
Hyperspectral workflow scripts in directory /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral
NCO version "4.6.2-beta03" from directory /gpfs/smallblockFS/sw/nco-4.6.2-beta03/bin
Intermediate/temporary files written to directory /gpfs_scratch/arpae/imaging_spectrometer
Final output stored in directory /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral/2016-05-01/2016-05-01__09-33-08-075
Input #00: /projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw
trn(in) : /projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw
trn(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_trn.nc.pid100194.fl00.tmp
ncks -O --hdr_pad=10000 --no_tmp_fl --trr_wxy=955,1600,21219 --trr typ_in=NC_USHORT --trr typ_out=NC_USHORT --trr ntl_in=bil --trr ntl_out=bsq --trr_in=/projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_dummy.nc /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_trn.nc.pid100194.fl00.tmp
att(in) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_trn.nc.pid100194.fl00.tmp
att(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
ncatted -O --gaa terraref_script=hyperspectral_workflow.sh --gaa terraref_hostname=cg-cmp25 --gaa terraref_version="4.6.2-beta03" -a "Conventions,global,o,c,CF-1.5" -a "Project,global,o,c,TERRAREF" --gaa history="Tue Jan 17 09:18:46 CST 2017: hyperspectral_workflow.sh -d 1 -i /projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw -o /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_.nc" /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_trn.nc.pid100194.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
jsn(in) : /projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw
jsn(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid100194
python /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_metadata.py dbg=yes fmt=4 ftn=no /projects/arpae/terraref/sites/ua-mac/raw_data/VNIR/2016-05-01/2016-05-01__09-33-08-075/c64c0602-79ad-48e5-8457-6bdb69205402_raw /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid100194.fl00.tmp
mrg(in) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid100194.fl00.tmp
mrg(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
ncks -A /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid100194.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
mrg(in) : /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/calibration_vnir_45ms.nc
mrg(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
ncks -A -C -v xps_img_wht,xps_img_drk /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/calibration_vnir_45ms.nc /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
clb(in) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
clb(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_clb.nc.pid100194.fl00.tmp
ncap2 -A --hdr_pad=10000 -s @drc_spt='"/gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral"' -S /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_calibration.nco /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid100194.fl00.tmp
Setting parser(filename)=/gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_calibration.nco
...but I'm trying to determine if I'm seeing a timeout issue in the pipeline workflow, or if I'm simply not waiting long enough (this ran for ~30 minutes before I terminated it). How long did you see for a successful 64 GB run?
Last time I checked it took 94 minutes, so let it run...
@jdmaloney mentioned this in terra room, but when moving from my sites/ua-mac/Level_1/hyperspectral_manualcheck directory to the proper /hyperspectral one, I no longer have write permissions for the extractor outputs. Can you add group write permissions to all the /Level_1 directories so my extractors can write to them? Thanks.
Didn't we decide the extractors should have write access, but not individuals? On Thu, Jan 19, 2017 at 1:14 PM Max Burnette notifications@github.com wrote:
@jdmaloney https://github.com/jdmaloney mentioned this in terra room, but when moving from my sites/ua-mac/Level_1/hyperspectral_manualcheck directory to the proper /hyperspectral one, I no longer have write permissions for the extractor outputs. Can you add group write permissions to all the /Level_1 directories so my extractors can write to them? Thanks.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/230#issuecomment-273885875, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX5waulY5vtEF7wM5FDnhE_WT0uTxOks5rT8QxgaJpZM4Lh1N- .
That seems to be how it's configured, but if so I'll still need to work out a solution for hyperspectral, since that isn't on a VM with mounted permissions like the others are - instead, it's running a job on Roger node and since I'm the one starting it, I'm not allowed. If you (dlebauer user) were to start it i think that would be fine since you own the directories,, but that isn't a good long-term solution.
If you send me a command line statement I can run it. Do you mean "chmod -R +w groupname sites/ua-mac/Level_1/'? (Easiest if you just tell me what to run)
5 of these are running now. 1,300 datasets queued. Will monitor progress next day or two.
This has been progressing. A lot of outputs in April/May and it's starting on early September now.
539 datasets processed.
@czender @solmazhajmohammadi The location of the hyperspectral outputs:
On Roger:
/projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral
Via Globus:
ua-mac/Level_1/hyperspectral
Returning to the earlier discussion about "time" v "Time" v "timestamp"(https://github.com/terraref/computing-pipeline/issues/230#issue-200385894). It looks like the "time" (or "Time") entry is there when there is position information and "timestamp" is there when "PLC control not available". The last entry with "Time" (capital T) is in May, and from that point forward it's "time". I'm wondering if the change to "timestamp" has to do with alternate code path, not a change to the metadata.
@craig-willis Do you need any correction of time / timestamp / Time in the source Code?
Or maybe @czender can answer.
Most important is that the names and formats don't change unless necessary, and if necessary, with a enough warning (a month) so downstream fixes can be made and tested.
@TinoDornbusch Could you provide an explanation of why these values changed?
I think the "Time" vs "time" change is older -- maybe an earlier correction to the metadata to use lowercase (the last case of "Time" I see is from 5/20/2016).
However, "time" versus "timestamp" seems to still be a problem. Why are there two different labels for the same value? For example:
VNIR/2016-12-06/2016-12-06__13-56-35-387/8f3b7fe9-7f04-44db-b52f-0c64e3cb1ee1_metadata.json "timestamp": "12/06/2016 13:56:35"
VNIR//2016-12-08/2016-12-08__15-30-44-025/a86e48ab-4954-4a11-bda6-1f9f7abbae9f_metadata.json "time": "12/08/2016 15:30:44"
@craig-willis @TinoDornbusch @dlebauer By now "time" / "timestamp" / "Time" are all keywords and will be guaranteed to be captured, so a little bit disorder in the metadata won't affect the downstream too much so far. Probably using three keywords bothers upstream people to manage the metadata. But hopefully we will not have more keywords.
@max-zilla How many wallclock days did it take to process the 2016 data? We may want to re-process after we add some new features to the workflow, and correct some metadata issues. Also, do you know why the output filenames all end with an underscore? I've just added the workflow script invocation as global metadata so we can see if there are clues in that.
@FlyingWithJerome can you make sure the metadata generated by the extractor downstream is consistent (e.g. wether time, timestamp or Time is passed, just make it 'time')
If you are working on the metadata now I have a few other suggestions for changes to the metadata, but I will address these in a separate issue. (my assumption is that we can update metadata later without re-running the entire pipeline ... is that correct?).
@FlyingWithJerome the output file does not contain a "time" variable. Instead it contains "frametime". However, the attributes say "The datestamp for each frame is stored in the *_frameindex.txt file which is archived in the time variable". This is inconsistent. Perhaps I wasn't clear but what I thought I asked is:
@czender and @FlyingWithJerome did you ever export nc metadata to json in the extractor? If so, shouldn't it show up here: https://terraref.ncsa.illinois.edu/clowder/files/587e8efe4f0cd67174dd1dcb?dataset=58702ad54f0c0dbad1a81378&space=57e42cd44f0cff4b58dd3eea
We never added a separate step and created a separate JSON file. We (@hmb1) added the capability of NCO to do so, however. Would you like a separate file (i.e., output_metadata.json), or a new attribute in the existing file that contains the JSON dump of the metadata?
@czender my understanding is that there is metadata at the dataset level and the file level. It would make sense to add this to the file-level metadata, since it applies to the output .nc files but not the input files. so you could create both the output_metadata.json and insert it into the Clowder metadata database (@max-zilla can suggest how to do this)
@dlebauer the link you posted was run through the netCDF extractor:
terra.netcdf Thu Jan 19 08:14:20 CST 2017 N/A START: Started processing
terra.netcdf Thu Jan 19 08:14:20 CST 2017 N/A PROCESSING: Downloading file.
...which extracts some metadata. However, even with the Roger filesystem mounted, it appears our VM with 4GB ram chokes trying to handle the 195GB file (wasn't sure if this would be the case depending on how the header is read). I will talk with @robkooper about possible workarounds for large .nc files, but in a pinch we could run the extractor as a Roger job the way we are the hyperspectral workflow (and eventually, Solmaz' sensor fusion).
@czender the hyperspectral extraction just finished the first run overnight, it appears - I have 1,119 .nc files in the Level_1 directory but since some of the recent fixes weren't deployed the leftover datasets were probably missing a field or something. So call it a week roughly with 5 extractors running and the odd ~8 hours of downtime here and there if the Roger job expired at 1am or something.
David is correct that we can very easily add metadata to the .nc file itself - I think his approach is good:
e: I'm still catching up with emails from yesterday afternoon and I see a pull request has come and gone in the meantime :)
Now that there's a separate JSON file with the metadata, I'm not sure how else to help with this because I don't know how to post it to Clowder. BTW, the command to dump the metadata in JSON format is ncks -m -M --json in.nc. It should only require trivial amounts of RAM. If this is not your experience, let me know.
@czender just want to clarify:
Here is the salient portion of the JSON code:
logging.info('...extracting metadata in json format: %s' % metaFilePath)
with open(metaFilePath, 'w') as fmeta:
subprocess.call(['ncks', '--jsn', '-m', '-M', inPath], stdout=fmeta)
if os.path.exists(metaFilePath):
pyclowder.files.upload_to_dataset(connector, host, secret_key, resource['parent']['id'], metaFilePath)
...i think only modification is to update netCDF extractor so that the JSON file here is opened and parsed and added to Clowder as metadata, in addition to what it already does (add the JSON file itself to Clowder).
Does that sound correct?
@dlebauer above asked me to create a JSON file so I added this creation to the HS workflow. @max-zilla says that step was already implemented (though not working at the time) by the "netCDF extractor" so now there are two JSON files. I will turn-off the JSON generation in the HS workflow. @max-zilla is the expert at putting things into clowder, so I suggest he do it. Too many cooks :)
@czender @dlebauer I updated the netCDF metadata extractor so that, in addition to CDL/XML/JSON files that are created and uploaded, the JSON file contents are added to .NC file as metadata.
@czender are there sufficient updates to the workflow that we should re-run extractor on the 2016 data, or are more updates incoming?
There has been some worthwhile changes and more are coming. I recommend re-running 2016 in two weeks. This should give time to get the latitude/longitude coordinates in, and some more _FillValues.
@czender I ran the newest code on 04/15 of 2017. Outputs are in:
/sites/ua-mac/Level_1/hyperspectral/2017-04-15/
The VNIR ran ok from the look of things, but SWIR failed. However it looks like there might be some broken links in Clowder SWIR datasets that could be responsible - going to diagnose today and try some re-runs.
SWIR will go in /Level_1/hyperspectral_swir directory.
@jdmaloney i traced this problem back to the bulk rebuild script we ran to create the SWIR datasets:
mburnet2@terra-clowder:/home/clowder/bulk_rebuild$ ls /home/clowder/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/
15de159e-4446-46ad-b2d9-c0d0ad8563d8_frameIndex.txt 15de159e-4446-46ad-b2d9-c0d0ad8563d8_metadata.json 15de159e-4446-46ad-b2d9-c0d0ad8563d8_raw.hdr
15de159e-4446-46ad-b2d9-c0d0ad8563d8_image.jpg 15de159e-4446-46ad-b2d9-c0d0ad8563d8_raw 15de159e-4446-46ad-b2d9-c0d0ad8563d8_settings.txt
mburnet2@terra-clowder:/home/clowder/bulk_rebuild$ tail lists/raw_data/uamac_SWIR.list
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__14-14-19-938/66359f20-8839-4158-a083-48dded7f41e5_2016_12_08_14_17_35image.jpg
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__14-14-19-938/66359f20-8839-4158-a083-48dded7f41e5_metadata.json
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__14-14-19-938/66359f20-8839-4158-a083-48dded7f41e5_2016_12_08_14_17_35settings.txt
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__14-14-19-938/66359f20-8839-4158-a083-48dded7f41e5_2016_12_08_14_17_35raw.hdr
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/15de159e-4446-46ad-b2d9-c0d0ad8563d8_2016_12_08_15_11_49settings.txt
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/15de159e-4446-46ad-b2d9-c0d0ad8563d8_2016_12_08_15_11_49raw.hdr
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/15de159e-4446-46ad-b2d9-c0d0ad8563d8_2016_12_08_15_11_49frameIndex.txt
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/15de159e-4446-46ad-b2d9-c0d0ad8563d8_metadata.json
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/15de159e-4446-46ad-b2d9-c0d0ad8563d8_2016_12_08_15_11_49raw
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-08-33-771/15de159e-4446-46ad-b2d9-c0d0ad8563d8_2016_12_08_15_11_49image.jpg
Simpler example of one file:
FILENAME:
15de159e-4446-46ad-b2d9-c0d0ad8563d8_image.jpg
ENTRY IN LISTS FILE:
15de159e-4446-46ad-b2d9-c0d0ad8563d8_2016_12_08_15_11_49image.jpg
It's like the timestamp from the next dataset got appended onto the filenames of the previous somehow.
I'm a bit confused though, because the list I'm pasting from didn't include up to 04/15, but datasets on 04/15 have this problem: https://terraref.ncsa.illinois.edu/clowder/datasets/58f2cec64f0c5bee63a4d655 (the Clowder points to the wrong path with the appended timestamp)
...But I couldn't find any place in my pipeline code that was obviously introducing this, and the data from yesterday which went through the pipeline is OK: https://terraref.ncsa.illinois.edu/clowder/datasets/5907cdfc4f0c20f1bf08b589
...I'm thinking perhaps we just purge the relatively small number of SWIR datasets from Clowder and recreate with a corrected lists file as above.
@max-zilla and @jdmaloney is this the same issue as recently addressed/solved in #281 ? @solmazhajmohammadi says the extra date was introduced by "headwall software" and is now fixed.
@czender yes that's exactly what it is. I had not seen @dlebauer rename script on the SWIR data, but the Clowder upload happened when that data landed and thus still has pointers to the old filenames so it couldn't find them.
thanks for pointing that out, I'll work with @robkooper and @jdmaloney to get the corrected filenames in Clowder database and run it - as you saw about looks like the SWIR at least processes correctly.
I tried running the field stitching script on 04/15, but the VRT creation process failed:
gdalbuildvrt -srcnodata "-99 -99 -99" -overwrite
-input_file_list /home/extractor/sites/ua-mac/Level_1/fullfield/2017-04-15/hyperspectral_fileList.txt
/home/extractor/sites/ua-mac/Level_1/fullfield/2017-04-15/hyperspectral_fullfield.VRT
0...10...20...30.
Warning 1: Unsupported netCDF datatype (8), treat as Float32.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
Warning 1: Unsupported netCDF datatype (8), treat as Float32.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
Warning 1: Unsupported netCDF datatype (8), treat as Float32.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
Warning 1: Unsupported netCDF datatype (8), treat as Float32.
Warning 1: dimension #2 (x) is not a Longitude/X dimension.
Warning 1: dimension #1 (y) is not a Latitude/Y dimension.
...
More than 1000 errors or warnings have been reported. No more will be reported from now.
This is the result on the .nc files.
Trying to sort out the SWIR query fix today.
@max-zilla did you try gdal_translate nc-->tif?
http://www.gdal.org/frmt_various.html
GMT -- GMT Compatible netCDF
GDAL has limited support for reading and writing netCDF grid files. NetCDF files that are not recognised as grids (they lack variables called dimension, and z) will be silently ignored by this driver. This driver is primarily intended to provide a mechanism for grid interchange with the GMT package. The netCDF driver should be used for more general netCDF datasets.
The units information in the file will be ignored, but x_range, and y_range information will be read to get georeferenced extents of the raster. All netCDF data types should be supported for reading.
Newly created files (with a type of GMT) will always have units of "meters" for x, y and z but the x_range, y_range and z_range should be correct. Note that netCDF does not have an unsigned byte data type, so 8bit rasters will generally need to be converted to Int16 for export to GMT.
NetCDF support in GDAL is optional, and not compiled in by default.
NOTE: Implemented as gdal/frmts/netcdf/gmtdataset.cpp.
See Also: Unidata NetCDF Page
@czender I corrected the SWIR datasets and pulled the latest hyperspectral code, then submitted 04/15. Got errors:
2017-05-19 14:56:36,229 [Thread-70 ] INFO : pyclowder.connectors - dataset ID [58f21de24f0c5bee63a2c9fa] : START: Started processing
Traceback (most recent call last):
File "/gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_metadata.py", line 745, in <module>
main()
File "/gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_metadata.py", line 741, in main
testCase.writeToNetCDF(file_input, file_output, " ".join((file_input, file_output)), format, flatten, debug)
File "/gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_metadata.py", line 180, in writeToNetCDF
assert len(wavelength) in (955, 272), "ERROR: Failed to get wavlength informations. Please check if you modified the *.hdr files"
AssertionError: ERROR: Failed to get wavlength informations. Please check if you modified the *.hdr files
2017-05-19 14:56:39,066 [Thread-70 ] ERROR : root - script encountered an error
2017-05-19 14:56:39,067 [Thread-70 ] ERROR : root - no output file was produced
@max-zilla There are two possibilities.
This is our fault because we never updated the validator for the new SWIR configuration. Will submit PR today that fixes it.
@max-zilla we just merged the fix for SWIR data. please retry extractor...
@czender thanks for quick turnaround, will get that going tonight or tomorrow morning.
@czender I just reran for SWIR 04-15 and it looks like things were processed this time - I see some .nc files in the hyperspectral_swir Level_1 directory.
@czender also pulled latest code and I'm rerunning 04-15 for VNIR this morning. had to also update my extractor with an --overwrite flag to force it to replace the previous versions.
@max-zilla please close the issue and create a new one for running the full season
@max-zilla reminder
Following discussions in #195....
As soon as Roger maintenance completes this morning, I will start processing.
@czender @FlyingWithJerome However, based on a few spot checks I anticipate some of the VNIR datasets will fail (safely). It seems the metadata formats have changed over time and we may need to expand the fields that are checked in the script.
https://terraref.ncsa.illinois.edu/clowder/datasets/58702ace4f0c0dbad1a81307 (Clowder will be offline until Roger maintenance is complete) For example, this VNIR dataset from 12/05 has the following metadata:
When I submitted this file for extraction, received this error:
...and the problematic variable in question is defined in the python file as:
In the metadata above, I suspect the yearMonthDate we'd want is now:
Ideally this hasn't changed, but if I understand correctly we'd need to check for a "timestamp" key in data as well (?). The best broad approach at this point is probably for us to just run all the datasets, let those that fail fail, and check our error messages to determine what needs to be adjusted later. Hopefully the majority will go through with no issues - I think we all expect that the earliest 2016 data could have rough edges, but this would help us identify issues with subsequent data as well. @robkooper this would be a valuable application of the new RabbitMQ error queues to preserve failed messages as well.
Maybe I'm misinterpreting things but that's my sense from the metadata+error.