terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
21 stars 13 forks source link

Launching batch extractors on Roger #195

Closed max-zilla closed 7 years ago

max-zilla commented 7 years ago

@robkooper @yanliu-chn @jterstriep follow up from meeting today. Goal is to get all TERRA data from the summer processed through all extractors.

Yan will update the nco module on Roger to new version.

Max will get dependencies installed on Roger and write launcher.sh for each extractor that will load virtualenv, modules, etc. and runs extractor. This will start one instance. Rob can help if needed.

Jeff will write qsub wrapper to call the launcher.sh and start many instances of extractors.

We can discuss here.

yanliu-chn commented 7 years ago

@jterstriep here is the HPC launcher script for plantcv extractor, for your reference: https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-plantcv/browse

max-zilla commented 7 years ago

@jterstriep I have installed dependencies for demosaic extractor on Roger head node as a first extractor.

(@robkooper @jdmaloney @yanliu-chn FYI)

...not sure if having the files in my home directory will cause permissions problems.

the extractor started running but failed quickly with "file name too long" error:

OUT FILE NAME:
/projects/arpae/terraref/sites/ua-mac/Level_1/demosaic/2016-10-30/2016-10-30__10-48-08-731/33a5f544-681c-42b6-adb1-fd09ad8c4c99_right.tif
ERROR 1: TIFFOpen:????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????f罻: File name too long
OK

The JPGs were generated properly, but this file path is apparently too long for GDAL. On my VM elsewhere, I didn't encounter that error but filenames were slightly shorter:

OK ON VM:
/home/extractor/sites/ua-mac/Level_1/demosaic/2016-10-30/2016-10-30__10-36-05-536/88867503-f576-47de-926e-dee347007c5c_left.tif

TOO LONG ON ROGER:
/projects/arpae/terraref/sites/ua-mac/Level_1/demosaic/2016-10-30/2016-10-30__10-35-35-433/a41f1694-3e6f-4ae9-b379-bb6d41151005_left.tif

I'm wondering if we'll need to use some kind of symlinks or something to pass GDAL a shorter filepath for the output TIFs. The JPGs are theoretically the same file length and cause no errors, but it looks like those are piped through numpy rather than GDAL for processing. I'm going to experiment a bit more.

max-zilla commented 7 years ago

@jterstriep @robkooper @yanliu-chn @jdmaloney @dlebauer

OK, updated the code to address the "file name too long issue":

This might be required for other extractors that use GDAL, but most scripts do not have a problem with long file paths.

@jterstriep with that in mind, I think we are ready to test this. we want to run the launcher.sh script as David Lebauer (dlebauer) so the output files will be owned by him, which is how the existing extractor deployments work.

To start an instance we should just need:

cd /home/mburnet2/extractors/extractors-stereo-rgb/demosaic
./batch_launcher.py

This will load required modules, activate python environment, and start listening to RabbitMQ.

Let me know if the files being in my home directory is a problem, but you should have execute permissions I think.

max-zilla commented 7 years ago

I'll get the other extractors prepped in the meantime (git clone on Roger, install additional dependencies) but we can use this first one as a test case. Note that I have an instance of this extractor running on separate VM at the moment - i will leave it running in the meantime.

max-zilla commented 7 years ago

@jterstriep also ready:

cd /home/mburnet2/extractors/extractors-multispectral/flir
./batch_launcher.py

...and...

cd /home/mburnet2/extractors/extractors-environmental/environmentlogger
./batch_launcher.py
max-zilla commented 7 years ago

@jterstriep I just moved the hyperspectral extractor latest code out there as well:

cd /home/mburnet2/extractors/extractors-hyperspectral/hyperspectral
./batch_launcher.py

...I haven't really been able to test this fully due to the memory requirements, but it seemed to get at least that far in my local testing. Charlie reports here that he was able to get it running successfully: https://github.com/terraref/computing-pipeline/issues/194 ...not sure if it will require a different instantiation approach than the others?

max-zilla commented 7 years ago

the PLY -> LAS converter for 3d scanner data should also be ready:

cd /home/mburnet2/extractors/extractors-3dscanner/ply2las
./batch_launcher.py
jterstriep commented 7 years ago

@max-zilla I can't see anything in your home directory. Can you place the code in the project space?

max-zilla commented 7 years ago

@jterstriep I'll try to do so after the terra call from 10-11.

I think if I understand python virtualenv correctly, I should be able to just move the pyenv directory outside my home and still have it functional (we'll have to update the batch_launcher scripts to point to correct place).

I'll put things here:

/projects/arpae/terraref/shared/extractors
max-zilla commented 7 years ago

@jterstriep OK, just copied the files over, since it's /shared i hope you can access.

I'll update other scripts soon but I quickly updated one in-place for you:

/projects/arpae/terraref/shared/extractors/extractors-3dscanner/ply2las

batch_launcher.sh has been edited to point to the /projects location instead of /home/mburnet2.
max-zilla commented 7 years ago

@jterstriep OK, here's something you can use for testing.

When you submit the file for extraction, it will reprocess and generate a .nc file corresponding to the json file. This is a bit different than the proper pipeline because the test file I uploaded was not already on Roger (for most files we copy to roger then just give Clowder the pointer) and there's already an .nc file for this file on Roger from the proper pipeline - so the extractor won't be able to write the .nc file to that output location, but if it gets that far I think we're OK.

Basically you can 1) start extractor 2) go to the file in Clowder, logging in as Maricopa Site 3) click 'Submit file for extraction' towards the bottom under Extractions 4) submit to terra.environmentlogger with no parameters

...should kick off extractor. you know it runs successfully if an .nc file shows up in the dataset when you refresh it.

jterstriep commented 7 years ago

I couldn't get the terra.ply2las.py run. I finally tracked down the problem to a bad virtualenv at /projects/arpae/terraref/shared/extractors/pyenv/bin/activate. Once activated, python still points to the ROGER module python rather than the wrapper in the virtualenv so the terra.ply2las.py doesn't find pyclowder.

Should I fix this? Is anything else using /projects/arpae/terraref/shared/extractors/pyenv at this point?

max-zilla commented 7 years ago

@jterstriep I also realized during testing yesterday that the extractor requires two additional modules to be loaded or it will fail on the actual extraction (GCC and proj4) - I updated the batch_launcher.sh and ran git pull on the one you're using.

You can fix that, sure - nothing else is using that environment. I had copied it from /home/mburnet2/extractors/pyenv rather than recreating it which in retrospect probably screwed up some paths. Thanks.

jterstriep commented 7 years ago

@max-zilla OK, I'll rebuild it.

We might want to revisit this. I'm beginning to think batch_launcher.sh for each extractor might not be the best approach.

max-zilla commented 7 years ago

@jterstriep uploaded a sample PLY file to Google Drive for you: https://drive.google.com/file/d/0B42lleyRvKnlaFlWRXQyMzhCZ1U/view?usp=sharing

Verified at http://rabbitmq.ncsa.illinois.edu:15672/#/queues/clowder/terra.ply2las that the ply2las queue is empty with 0 consumers - uploading that PLY file to a new dataset as the Maricopa Site user in Clowder should trigger a message that your extractor will execute.

One caveat is that the extractor will need to write the resulting output to projects/arpae/terraref/sites/ua-mac/Level_1/scanner3DTop directory so the user running the extractor must have permissions there. @jdmaloney gave my account permission for that to run the two extractions yesterday, but @robkooper mentioned we could also run the extractor as David LeBauer (dlebauer) who owns all those directories.

The extractor will have run successfully if an LAS file appears in the dataset, but if you get python errors that you don't have permission for the Level_1 output directory the extractor basically ran successfully at that point and it's just a permissions issue.

max-zilla commented 7 years ago

@jterstriep any updates on this?

the terra.demosaic queue is slowly filling up since there's only 1 extractor currently deployed on a VM - up to 257,000 messages. this is this one:

cd /projects/arpae/terraref/shared/extractors/extractors-stereo-rgb/demosaic
./batch_launcher.py

(most of those messages are datasets that will be checked and determined to be irrelevant, but they still need to be checked)

...this might be a good case to test with.

max-zilla commented 7 years ago

@robkooper @jterstriep

Rob, Jeff has pushed some edits to the PLY->LAS extractor based on making things better suited for the HPC environment. He took a look at how we had the extractors set up on Roger and had some modifications to the code to better prepare for HPC environment.

These are edits to a pyClowder 1 extractor. We went over those changes together and several of them fit the direction we're heading for pyClowder 2. I thought it would make sense to share those changes so we can discuss whether to update pyClowder 2 itself to include aspects of them.

One change is already in pyClowder 2 - getting rid of config.py and making those parameters into command line args. The other change is adding a setup.py script that installs the python script + dependencies as a proper module.

Links:

I think we can get the best of both worlds here by updating the extractors to pyclowder 2 (I was going to work on that anyway) and adding setup.py for each from Jeff's template.

Next day or so I'm going to run the other Mongo queries to generate UUID lists for queuing for extraction manually, and I'll test the manually extraction queuing a bit on datasets to see whether the possible issue I mentioned last week exists (where the RMQ message sent there does not include datasetID we need).

max-zilla commented 7 years ago

Needed to create pull request for Clowder to support doing this for files: https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/clowder/pull-requests/1085/overview

The dataset extractors (aka the majority of them) should be fine.

POST https://terraref.ncsa.illinois.edu/clowder/api/datasets/123456/extractions
{"extractor": "extractor.name"}

Now to generate the UUID lists.

max-zilla commented 7 years ago

Creating sub-issues for the final drive to completion.

212

213

214

max-zilla commented 7 years ago

@jterstriep couple items

https://github.com/terraref/computing-pipeline/issues/213

The extractor can be launched using:

/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/batch_launcher.sh
max-zilla commented 7 years ago

@jterstriep some progress - qsub is starting & triggering hyperspectral extractor correctly, but I found a new wrinkle.

Most of Zender's code is in a file called hyperspectral_workflow.sh in the same directory as the python script. In the extractor python script, it uses subprocess to call:

returncode = subprocess.call(["bash", "hyperspectral_workflow.sh", "-d", "1", "-i", target_files['raw']['path'], "-o", outFilePath])

...now this works fine when calling python directly, but in the batch job context I need to change something:

bash: hyperspectral_workflow.sh: No such file or directory
2016-12-15 14:09:18,335 [Connector-0    ] ERROR   : root - script encountered an error
2016-12-15 14:09:18,335 [Connector-0    ] ERROR   : root - no output file was produced

Not a surprise that wherever the job is executing doesn't know what the heck this shell script is that it refers to. Question is, what do you think best way to handle is?

  1. make --script a cmd line arg where you pass in full qualified path: --script /projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_workflow.sh

  2. use qsub -d to define a working directory (https://wikis.nyu.edu/display/NYUHPC/Tutorial+-+Submitting+a+job+using+qsub#Tutorial-Submittingajobusingqsub-work_dir) e.g.

    qsub -l nodes=1 -d /projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/ batch_launcher.sh

    I tried this initially, but got an email that makes me think I need /gpfs/largeblockFS/ or something similar?:

    
    ROGER Job ID: 50054.cg-gpu01 
    Job Name: batch_launcher.sh 
    Exec host list suppressed in email 
    An error has occurred processing your job, see below. 
    Post job file processing error; job 50054.cg-gpu01 on host cg-cmp16

Unable to copy file 50054.cg-gpu01.OU to /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/batch_launcher.sh.o50054, error 1 error from copy /bin/cp: cannot stat `50054.cg-gpu01.OU': No such file or directory end error output



3. Some other solution better than these?
robkooper commented 7 years ago

I like option 1 since that does not change the working directory.

yanliu-chn commented 7 years ago

If you qsub a script, it copies to a tmp location anyway.

The location of the workflow shell script should be part of the extractor config. That means you can define an entry in config.py in the extractor. If the location changes from one deployment to another, you can refer to an env var in config.py, while the env var is set up in $HOME/.bashrc

Thanks, Yan

From: Max Burnette notifications@github.com<mailto:notifications@github.com> Reply-To: terraref/computing-pipeline reply@reply.github.com<mailto:reply@reply.github.com> Date: Thursday, December 15, 2016 at 2:28 PM To: terraref/computing-pipeline computing-pipeline@noreply.github.com<mailto:computing-pipeline@noreply.github.com> Cc: Yan Liu yanliu@illinois.edu<mailto:yanliu@illinois.edu>, Assign assign@noreply.github.com<mailto:assign@noreply.github.com> Subject: Re: [terraref/computing-pipeline] Launching batch extractors on Roger (#195)

@jterstriephttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jterstriep&d=DQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=QbY3HDLn4TqJD-LXnNSvwTwwOferKDqWlH-gZd7YVUQ&m=g9qovujp2K59adAD29v35wJwWqMRwowsxIv05AKuSZ0&s=siscXZ3ovUNTGOAbdtK05YBMI8NxyykM2dpJtReCm1E&e= some progress - qsub is starting & triggering hyperspectral extractor correctly, but I found a new wrinkle.

Most of Zender's code is in a file called hyperspectral_workflow.sh in the same directory as the python script. In the extractor python script, it uses subprocess to call:

returncode = subprocess.call(["bash", "hyperspectral_workflow.sh", "-d", "1", "-i", target_files['raw']['path'], "-o", outFilePath])

...now this works fine when calling python directly, but in the batch job context I need to change something:

bash: hyperspectral_workflow.sh: No such file or directory 2016-12-15 14:09:18,335 [Connector-0 ] ERROR : root - script encountered an error 2016-12-15 14:09:18,335 [Connector-0 ] ERROR : root - no output file was produced

Not a surprise that wherever the job is executing doesn't know what the heck this shell script is that it refers to. Question is, what do you think best way to handle is?

  1. make --script a cmd line arg where you pass in full qualified path: --script /projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_workflow.sh

  2. use qsub -d to define a working directory (https://wikis.nyu.edu/display/NYUHPC/Tutorial+-+Submitting+a+job+using+qsub#Tutorial-Submittingajobusingqsub-work_dirhttps://urldefense.proofpoint.com/v2/url?u=https-3A__wikis.nyu.edu_display_NYUHPC_Tutorial-2B-2D-2BSubmitting-2Ba-2Bjob-2Busing-2Bqsub-23Tutorial-2DSubmittingajobusingqsub-2Dwork-5Fdir&d=DQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=QbY3HDLn4TqJD-LXnNSvwTwwOferKDqWlH-gZd7YVUQ&m=g9qovujp2K59adAD29v35wJwWqMRwowsxIv05AKuSZ0&s=XTqmpvh8RyQRMnTG8A_WOPj9GhxhrsROm7HUST8VMJ0&e=) e.g.

qsub -l nodes=1 -d /projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/ batch_launcher.sh

I tried this initially, but got an email that makes me think I need /gpfs/largeblockFS/ or something similar?:

ROGER Job ID: 50054.cg-gpu01 Job Name: batch_launcher.sh Exec host list suppressed in email An error has occurred processing your job, see below. Post job file processing error; job 50054.cg-gpu01 on host cg-cmp16

Unable to copy file 50054.cg-gpu01.OU to /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/batch_launcher.sh.o50054, error 1 error from copy /bin/cp: cannot stat `50054.cg-gpu01.OU': No such file or directory end error output

  1. Some other solution better than these?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_terraref_computing-2Dpipeline_issues_195-23issuecomment-2D267435082&d=DQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=QbY3HDLn4TqJD-LXnNSvwTwwOferKDqWlH-gZd7YVUQ&m=g9qovujp2K59adAD29v35wJwWqMRwowsxIv05AKuSZ0&s=vHbUy3TMYWnD2FP_6yzYsCADbsGIZ5q8Ijl4HNWvRbw&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AEArrR3bcfpPB-2Dr4-2DwdJObKqwMBUfs1sks5rIaLRgaJpZM4KjvZB&d=DQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=QbY3HDLn4TqJD-LXnNSvwTwwOferKDqWlH-gZd7YVUQ&m=g9qovujp2K59adAD29v35wJwWqMRwowsxIv05AKuSZ0&s=YsiFAai_fpvHqXZtAA22Mh26kfOyqJP7DZlpRfv__T0&e=.

jterstriep commented 7 years ago

I suggest option 3.

Implement proper setup.py files so that when the extractor is installed both the python script and the shell script get copied to the bin directory (in this case, /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors) and the hyperspectral_workflow.sh script will be found using $PATH. Plus all the other benefits I discussed this morning.

jterstriep commented 7 years ago

@yanliu-chn, config.py is no more!

yanliu-chn commented 7 years ago

The /gpfs/largeblockFS/ stuff is somewhat irrelavant: pbs tries to copy the stdout/stderr to where it should be (by default, the dir where the job was submitted), but somehow it tried to do that after the job is done and the temp stderr/stdout was deleted.

From: Max Burnette notifications@github.com<mailto:notifications@github.com> Reply-To: terraref/computing-pipeline reply@reply.github.com<mailto:reply@reply.github.com> Date: Thursday, December 15, 2016 at 2:28 PM To: terraref/computing-pipeline computing-pipeline@noreply.github.com<mailto:computing-pipeline@noreply.github.com> Cc: Yan Liu yanliu@illinois.edu<mailto:yanliu@illinois.edu>, Assign assign@noreply.github.com<mailto:assign@noreply.github.com> Subject: Re: [terraref/computing-pipeline] Launching batch extractors on Roger (#195)

@jterstriephttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jterstriep&d=DQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=QbY3HDLn4TqJD-LXnNSvwTwwOferKDqWlH-gZd7YVUQ&m=g9qovujp2K59adAD29v35wJwWqMRwowsxIv05AKuSZ0&s=siscXZ3ovUNTGOAbdtK05YBMI8NxyykM2dpJtReCm1E&e= some progress - qsub is starting & triggering hyperspectral extractor correctly, but I found a new wrinkle.

Most of Zender's code is in a file called hyperspectral_workflow.sh in the same directory as the python script. In the extractor python script, it uses subprocess to call:

returncode = subprocess.call(["bash", "hyperspectral_workflow.sh", "-d", "1", "-i", target_files['raw']['path'], "-o", outFilePath])

...now this works fine when calling python directly, but in the batch job context I need to change something:

bash: hyperspectral_workflow.sh: No such file or directory 2016-12-15 14:09:18,335 [Connector-0 ] ERROR : root - script encountered an error 2016-12-15 14:09:18,335 [Connector-0 ] ERROR : root - no output file was produced

Not a surprise that wherever the job is executing doesn't know what the heck this shell script is that it refers to. Question is, what do you think best way to handle is?

  1. make --script a cmd line arg where you pass in full qualified path: --script /projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_workflow.sh

  2. use qsub -d to define a working directory (https://wikis.nyu.edu/display/NYUHPC/Tutorial+-+Submitting+a+job+using+qsub#Tutorial-Submittingajobusingqsub-work_dirhttps://urldefense.proofpoint.com/v2/url?u=https-3A__wikis.nyu.edu_display_NYUHPC_Tutorial-2B-2D-2BSubmitting-2Ba-2Bjob-2Busing-2Bqsub-23Tutorial-2DSubmittingajobusingqsub-2Dwork-5Fdir&d=DQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=QbY3HDLn4TqJD-LXnNSvwTwwOferKDqWlH-gZd7YVUQ&m=g9qovujp2K59adAD29v35wJwWqMRwowsxIv05AKuSZ0&s=XTqmpvh8RyQRMnTG8A_WOPj9GhxhrsROm7HUST8VMJ0&e=) e.g.

qsub -l nodes=1 -d /projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/ batch_launcher.sh

I tried this initially, but got an email that makes me think I need /gpfs/largeblockFS/ or something similar?:

ROGER Job ID: 50054.cg-gpu01 Job Name: batch_launcher.sh Exec host list suppressed in email An error has occurred processing your job, see below. Post job file processing error; job 50054.cg-gpu01 on host cg-cmp16

Unable to copy file 50054.cg-gpu01.OU to /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/batch_launcher.sh.o50054, error 1 error from copy /bin/cp: cannot stat `50054.cg-gpu01.OU': No such file or directory end error output

  1. Some other solution better than these?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_terraref_computing-2Dpipeline_issues_195-23issuecomment-2D267435082&d=DQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=QbY3HDLn4TqJD-LXnNSvwTwwOferKDqWlH-gZd7YVUQ&m=g9qovujp2K59adAD29v35wJwWqMRwowsxIv05AKuSZ0&s=vHbUy3TMYWnD2FP_6yzYsCADbsGIZ5q8Ijl4HNWvRbw&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AEArrR3bcfpPB-2Dr4-2DwdJObKqwMBUfs1sks5rIaLRgaJpZM4KjvZB&d=DQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=QbY3HDLn4TqJD-LXnNSvwTwwOferKDqWlH-gZd7YVUQ&m=g9qovujp2K59adAD29v35wJwWqMRwowsxIv05AKuSZ0&s=YsiFAai_fpvHqXZtAA22Mh26kfOyqJP7DZlpRfv__T0&e=.

max-zilla commented 7 years ago

Thanks all. For this afternoon I'll just quickly add a command line arg, and if it works I'll later use your 3dscanner setup.py as a template to update as you suggest @jterstriep .

yanliu-chn commented 7 years ago

Ok. Then we can assume there is always an env var $EXTRACTOR_WORKFLOW_BINDIR there. Then in python extractor code, send $EXTRACTOR_WORKFLOW_BINDIR/ex.sh to bash.

From: Jeff Terstriep notifications@github.com<mailto:notifications@github.com> Reply-To: terraref/computing-pipeline reply@reply.github.com<mailto:reply@reply.github.com> Date: Thursday, December 15, 2016 at 3:21 PM To: terraref/computing-pipeline computing-pipeline@noreply.github.com<mailto:computing-pipeline@noreply.github.com> Cc: Yan Liu yanliu@illinois.edu<mailto:yanliu@illinois.edu>, Mention mention@noreply.github.com<mailto:mention@noreply.github.com> Subject: Re: [terraref/computing-pipeline] Launching batch extractors on Roger (#195)

@yanliu-chnhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_yanliu-2Dchn&d=DQMCaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=QbY3HDLn4TqJD-LXnNSvwTwwOferKDqWlH-gZd7YVUQ&m=BPRvCI41aRvkI1PpjyRS6V6vDOb9VDzU8x7p3bR2PGE&s=PCOxxduVhEkMzb8sob6gpTaPBxE0wxAlaHCHAxvWY04&e=, config.py is no more!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_terraref_computing-2Dpipeline_issues_195-23issuecomment-2D267447430&d=DQMCaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=QbY3HDLn4TqJD-LXnNSvwTwwOferKDqWlH-gZd7YVUQ&m=BPRvCI41aRvkI1PpjyRS6V6vDOb9VDzU8x7p3bR2PGE&s=DT384bPLKXyqa2R79N96FY7E9u_YJEnXsKHzavzFWFE&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AEArrQWAUAVQMDvTlFnZNHb9-5Fy-2D5-5FGEmks5rIa9hgaJpZM4KjvZB&d=DQMCaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=QbY3HDLn4TqJD-LXnNSvwTwwOferKDqWlH-gZd7YVUQ&m=BPRvCI41aRvkI1PpjyRS6V6vDOb9VDzU8x7p3bR2PGE&s=SbTwlaLHQb-ePcPa5sQYG5BXVWRnd9O_K23mG8hdwuY&e=.

max-zilla commented 7 years ago

@czender i just ran the hyperspectral extractor on Roger with this dataset: https://terraref.ncsa.illinois.edu/clowder/datasets/587024414f0c0dbad1a78b83

I duplicated this dataset on clowder-dev to make sure the extractor would work even if files weren't in a specific folder structure on ROGER, i.e. in an arbitrary Clowder instance, so I updated my code to move any /tmp files into one directory so your _workflow script can be called.

I got an error on the test, although it seems like the processing was done correctly. Here's some output:

testFrameTimeHasCorrectCalendarAttr (__main__.HyperspectralWorkflowTest) ... ok
testFrameTimeHasCorrectUnitsAttr (__main__.HyperspectralWorkflowTest) ... ok
testFrameTimeHasCorrectValue (__main__.HyperspectralWorkflowTest) ... ok
testHistoryIsCorrectlyRecorded (__main__.HyperspectralWorkflowTest) ... ok
testTheNumberOfDimensionsInRootLevelIsCorrect (__main__.HyperspectralWorkflowTest) ... expected failure
testTheWavelengthDimensionsHaveCorrectValues (__main__.HyperspectralWorkflowTest) ... ok
testTheXDimensionsHaveCorrectValues (__main__.HyperspectralWorkflowTest) ... ok
testTheYDimensionsMatchesTimeDimension (__main__.HyperspectralWorkflowTest) ... ok
testWavelengthArrayHasCorrectData (__main__.HyperspectralWorkflowTest) ... ok
testWavelengthArrayHasEnoughData (__main__.HyperspectralWorkflowTest) ... ok
testXHasEnoughAttributes (__main__.HyperspectralWorkflowTest) ... ok
testYHasEnoughAttributes (__main__.HyperspectralWorkflowTest) ... ok
testYHaveCorrectValuesAndAttributes (__main__.HyperspectralWorkflowTest) ... FAIL

======================================================================
FAIL: testYHaveCorrectValuesAndAttributes (__main__.HyperspectralWorkflowTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_test.py", line 227, in testYHaveCorrectValuesAndAttributes
    self.assertEqual(len(self.y), 169,  msg="The height of the image should always be 169 pxl")
AssertionError: The height of the image should always be 169 pxl

----------------------------------------------------------------------
Ran 13 tests in 0.039s

FAILED (failures=1, expected failures=1)
2017-01-10 13:11:25,375 [Connector-0    ] INFO    : root - uploading /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral_manualcheck/2016-12-11/2016-12-11__14-17-02-040/52079867-350a-48a6-9ef3-2f6049cc7b0f_.nc

And your script output:

Terraref hyperspectral data workflow invoked with:
hyperspectral_workflow.sh -d 1 -i /tmp/tmpwhg88m/52079867-350a-48a6-9ef3-2f6049cc7b0f_raw -o /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral_manualcheck/2016-12-11/2016-12-11__14-17-02-040/52079867-350a-48a6-9ef3-2f6049cc7b0f_.nc
Hyperspectral workflow scripts in directory /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral
NCO version "4.6.2-beta03" from directory /gpfs/smallblockFS/sw/nco-4.6.2-beta03/bin
Intermediate/temporary files written to directory /gpfs_scratch/arpae/imaging_spectrometer
Final output stored in directory /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral_manualcheck/2016-12-11/2016-12-11__14-17-02-040
Input #00: /tmp/tmpwhg88m/52079867-350a-48a6-9ef3-2f6049cc7b0f_raw
trn(in)  : /tmp/tmpwhg88m/52079867-350a-48a6-9ef3-2f6049cc7b0f_raw
trn(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_trn.nc.pid113572.fl00.tmp
ncks -O --hdr_pad=10000 --no_tmp_fl --trr_wxy=955,1600,135 --trr typ_in=NC_USHORT --trr typ_out=NC_USHORT --trr ntl_in=bil --trr ntl_out=bsq --trr_in=/tmp/tmpwhg88m/52079867-350a-48a6-9ef3-2f6049cc7b0f_raw /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_dummy.nc /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_trn.nc.pid113572.fl00.tmp
att(in)  : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_trn.nc.pid113572.fl00.tmp
att(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid113572.fl00.tmp
ncatted -O --gaa terraref_script=hyperspectral_workflow.sh --gaa terraref_hostname=cg-cmp15 --gaa terraref_version="4.6.2-beta03" -a "Conventions,global,o,c,CF-1.5" -a "Project,global,o,c,TERRAREF" --gaa history="Tue Jan 10 13:10:43 CST 2017: hyperspectral_workflow.sh -d 1 -i /tmp/tmpwhg88m/52079867-350a-48a6-9ef3-2f6049cc7b0f_raw -o /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral_manualcheck/2016-12-11/2016-12-11__14-17-02-040/52079867-350a-48a6-9ef3-2f6049cc7b0f_.nc" /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_trn.nc.pid113572.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid113572.fl00.tmp
jsn(in)  : /tmp/tmpwhg88m/52079867-350a-48a6-9ef3-2f6049cc7b0f_raw
jsn(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid113572
python /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_metadata.py dbg=yes fmt=4 ftn=no /tmp/tmpwhg88m/52079867-350a-48a6-9ef3-2f6049cc7b0f_raw /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid113572.fl00.tmp
mrg(in)  : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid113572.fl00.tmp
mrg(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid113572.fl00.tmp
ncks -A /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid113572.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid113572.fl00.tmp
mrg(in)  : /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/calibration_vnir_25ms.nc
mrg(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid113572.fl00.tmp
ncks -A -C -v xps_img_wht,xps_img_drk /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/calibration_vnir_25ms.nc /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid113572.fl00.tmp
clb(in)  : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid113572.fl00.tmp
clb(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_clb.nc.pid113572.fl00.tmp
ncap2 -A --hdr_pad=10000 -s @drc_spt='"/gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral"' -S /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_calibration.nco /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid113572.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid113572.fl00.tmp
Setting parser(filename)=/gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_calibration.nco
/bin/mv -f /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid113572.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_clb.nc.pid113572.fl00.tmp
rip(in)  : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_clb.nc.pid113572.fl00.tmp
rip(out) : /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral_manualcheck/2016-12-11/2016-12-11__14-17-02-040/52079867-350a-48a6-9ef3-2f6049cc7b0f_.nc
/bin/mv -f /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_clb.nc.pid113572.fl00.tmp /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral_manualcheck/2016-12-11/2016-12-11__14-17-02-040/52079867-350a-48a6-9ef3-2f6049cc7b0f_.nc
Cleaning-up intermediate files...
QA/QC check found with 1 or more unexpected FAILURES
Quick views of last processed data file and its original image (if any):
ncview  /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral_manualcheck/2016-12-11/2016-12-11__14-17-02-040/52079867-350a-48a6-9ef3-2f6049cc7b0f_.nc &
panoply /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral_manualcheck/2016-12-11/2016-12-11__14-17-02-040/52079867-350a-48a6-9ef3-2f6049cc7b0f_.nc &
open /tmp/tmpwhg88m/52079867-350a-48a6-9ef3-2f6049cc7b0f_image.jpg
Elapsed time 0m42s

I'll check to make sure my code here: https://github.com/terraref/extractors-hyperspectral/tree/master/hyperspectral ...matches any updates you have done in computing-pipeline, but in the meantime does this error mean anything to you? The _raw file was only 400 MB, so I'd like to get your thoughts before trying on e.g. a 102 GB _raw file. The pixel height error seems like it might be fairly mild?

But that aside, I prioritized SWIR and VNIR data in the Clowder rebuild and they've all been recreated as datasets, so after this is ironed out I'll run the extractor on all of them at long last.

czender commented 7 years ago

Hi Max, Thanks for testing this. The failure appears to be a false positive check emitted by the QAQC program. "The height of the image should always be 169 pxl" is something that @FlyingWithJerome apparently checks for in QAQC, but there is really no reason why an image should have 169 scanlines. We expect y = height = scanlines to vary. Jerome, please disable or alter this particular test if that is indeed the problem. FYI, I expect more false positives and small glitches now that we are running the workflow on more images. But they should be easy to solve. @max-zilla the workflow is pretty flexible about directory structures. In particular, one can instruct it to write temporary files to any directory with -T $tmp_dir. This overrides both the default on roger, which is /gpfs_scratch/arpae/imaging_spectrometer, and the fallback default of $TMPDIR.

max-zilla commented 7 years ago

@czender @FlyingWithJerome thanks Charlie. The directory problem in this case was that if one runs this extractor out on a random server that can't access the files locally, it has to download them from Clowder first - Clowder creates a certain directory structure to avoid filename collisions that wasn't compatible with the workflow script (it wouldn't find every file in one folder, for example) so I made sure to handle that in my code.

With that in mind, I think the next step will be for me to queue and run a handful of variously sized input files and then share the outputs with you so we can make sure they look good on ~10-20 datasets before we run the rest.

czender commented 7 years ago

"The _raw file was only 400 MB, so I'd like to get your thoughts before trying on e.g. a 102 GB _raw file." @max-zilla @solmazhajmohammadi @dlebauer @hmb1 We've discussed before what the maximum expected raw filesize would be and my recollection is that Lemnatec said 64 GB is the size of a raw file produced by a full field-width scan. The workflow works (I have tested it many times) on 64 GB raw files. The workflow will fail miserably with any significantly larger. That is what to expect. I don't know where the disconnect occurred, but somehow the maximum filesize is now 50% larger than the workflow was designed to handle. Until and unless the workflow is re-engineered, run it only on files smaller than ~65 GB :) Since there currently appears to be no maximum filesize, @hmb1 please alter the workflow to exit gracefully when the raw input file size, i.e., size of $in_fl in hyperspectral_workflow.sh, exceeds 65 GB? A few lines of Bash should do it.

max-zilla commented 7 years ago

That is fine with me - the large file was a SWIR file from early May so it could very well be deprecated at this point: https://terraref.ncsa.illinois.edu/clowder/datasets/58713c954f0cc129fb5b88c7

dlebauer commented 7 years ago

Getting the extractor to work on data from May is a very low priority ... so sticking with the 64GB max size is reasonable.

solmazhajmohammadi commented 7 years ago

@czender For a single row scan, the file size should not exceed the 64 GB limit (when we are running with minimum speed of 0.02m/s). It might be just a test on the system to see if it is able to handle the larger file size, or system was running in free run mode(for the whole field). Anyhow, @max-zilla If there is a file size larger than 64GB, please let me know the date and time. I can check it.

max-zilla commented 7 years ago

@czender @FlyingWithJerome OK, an update.

summary: extractor successfully triggered on VNIR file. still getting error on SWIR file.

VNIR https://terraref.ncsa.illinois.edu/clowder/datasets/587024414f0c0dbad1a78b83 .nc file here, the output is on Roger at

/projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral_manualcheck/2016-12-11/2016-12-11__14-17-02-040/

output is 1.3 GB from a 400mb input.

SWIR https://terraref.ncsa.illinois.edu/clowder/datasets/58713f394f0cc129fb5c2436 Tried triggering on this dataset from 12/08, but get this error:

Terraref hyperspectral data workflow invoked with:
hyperspectral_workflow.sh -d 1 -i /projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-27-55-795/5254f64d-279c-47e5-843f-232f6777d0ef_2016_12_08_15_31_11raw -o /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral_manualcheck/2016-12-08/2016-12-08__15-27-55-795/5254f64d-279c-47e5-843f-232f6777d0ef_2016_12_08_15_31_11.nc
Hyperspectral workflow scripts in directory /gpfs/largeblockFS/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral
NCO version "4.6.2-beta03" from directory /gpfs/smallblockFS/sw/nco-4.6.2-beta03/bin
Intermediate/temporary files written to directory /gpfs_scratch/arpae/imaging_spectrometer
Final output stored in directory /projects/arpae/terraref/sites/ua-mac/Level_1/hyperspectral_manualcheck/2016-12-08/2016-12-08__15-27-55-795
Input #00: /projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-27-55-795/5254f64d-279c-47e5-843f-232f6777d0ef_2016_12_08_15_31_11raw
trn(in)  : /projects/arpae/terraref/sites/ua-mac/raw_data/SWIR/2016-12-08/2016-12-08__15-27-55-795/5254f64d-279c-47e5-843f-232f6777d0ef_2016_12_08_15_31_11raw
trn(out) : /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_trn.nc.pid9776.fl00.tmp
ERROR: Unable to identify camera type (SWIR or VNIR?)
HINT: hyperspectral_workflow.sh requires header file  to report either 272 (SWIR) or 955 (VNIR) or wavelengths. Actual number reported = wvl_nbr = .

/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_workflow.sh: line 520: [: : integer expression expected
/projects/arpae/terraref/shared/extractors/extractors-hyperspectral/hyperspectral/hyperspectral_workflow.sh: line 522: [: : integer expression expected

It says "unable to identify camera type". I'll let you take a look to see if you think your script needs to be changed, but I will also say that I can easily tell which it is by the dataset name/path so if it's easy to make that into one of the args I can provide when I call the script, that's fine too.

If the VNIR seems fine to you, I'll get started processing the rest!

czender commented 7 years ago

@max-zilla the VNIR results look good.

I will fix the camera-type check so it assigns wvl_nbr == 273 to SWIR camera when I figure-out how to create a branch and a pull request. For now, what puzzles me is that your text output pasted above does not contain any printed variables. In other words, it does not contain the printed value of $wvl_nbr. So two things appear to need fixing. But SWIR is not operational, right? So please proceed with VNIR.

max-zilla commented 7 years ago

Going to close this issue now that the components are working at a basic level, with further adjustments going into https://github.com/terraref/computing-pipeline/issues/230