terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
24 stars 13 forks source link

Deploy PlantCV extractor on ROGER production #147

Closed max-zilla closed 8 years ago

max-zilla commented 8 years ago

Hi @yanliu-chn,

I'd like to deploy PlantCV extractor on the production instance of Clowder for TERRA.

I created a new instance on ROGER Kilo and started to pull the git repo and install PlantCV dependencies when I remembered you have done this before on Nebula. Did you use docker to do this? Do you have instructions or script you can send to me? Or if you think it would be easier/faster can you deploy on Kilo?

We only need to customize the config.py file slightly before running.

yanliu-chn commented 8 years ago

hi @max-zilla , I followed https://opensource.ncsa.illinois.edu/bitbucket/projects/BD/repos/dockerfiles/browse/clowder/plantcv/Dockerfile to set up dependencies with a bit modification on the extractor github location/branch and the correct pyclowder branch.

is it faster to copy the plantcv2 extractor VM in nebular and import it to ROGER kilo?

max-zilla commented 8 years ago

@yanliu-chn copying a working VM is probably easier than trying to build another from scratch. In the Dockerfile you linked, when we run it we'd need to override two environmental variables:

RABBITMQ_URI = "amqp://guest:guest@rabbitmq.ncsa.illinois.edu:5672/clowder" RABBITMQ_EXCHANGE = "terra"

I haven't gone through the export/import process with a VM on Nebula or ROGER. How is this done? I can see plantcv-extractor-v2 on Nebula but not sure what actions to take to do this safely.

max-zilla commented 8 years ago

@robkooper meanwhile I have a VM on Kilo with the terra demosaic extractor dependencies prepared that I think is ready to run. I wasn't going to use docker here, just run the extractor directly. Would you run the extractor in a screen session, as a service of some kind, or something else?

yanliu-chn commented 8 years ago

@max-zilla @robkooper the Dockerfile needs to be updated to reflect the following changes:

The config to RABBITMQ_URI somehow needs to be manually set because it contains secret keys. We currently use the clowder-dev virtual host.

This is the currently config.py:

# =============================================================================
#
# In order for this extractor to run according to your preferences,
# the following parameters need to be set.
#
# Some parameters can be left with the default values provided here - in that
# case it is important to verify that the default value is appropriate to
# your system. It is especially important to verify that paths to files and
# software applications are valid in your system.
#
# =============================================================================
import os
# name to show in rabbitmq queue list
#extractorName = os.getenv('RABBITMQ_QUEUE', "terraPlantCV")
extractorName = os.getenv('RABBITMQ_QUEUE', "terra")
# URL to be used for connecting to rabbitmq
#rabbitmqURL = os.getenv('RABBITMQ_URI', "amqp://guest:guest@127.0.0.1/%2f")
rabbitmqURL = os.getenv('RABBITMQ_URI', "amqp://clowder:XXXXXXXXX@rabbitmq.ncsa.illinois.edu/clowder-dev")
# name of rabbitmq exchange
#rabbitmqExchange = os.getenv('RABBITMQ_EXCHANGE', "clowder")
rabbitmqExchange = os.getenv('RABBITMQ_EXCHANGE', "terra")
# type of files to process
messageType = "*.dataset.file.added"
# trust certificates, set this to false for self signed certificates
sslVerify = os.getenv('RABBITMQ_SSLVERIFY', False)
# Comma delimited list of endpoints and keys for registering extractor information
registrationEndpoints = os.getenv('REGISTRATION_ENDPOINTS', "http://localhost:9000/clowder/api/extractors?key=key1, http://host2:9000/api/extractors?key=key2")
# Path to script that contains PlantCV modules to import
scriptPath = "../PlantcvClowderIndoorAnalysis.py"
yanliu-chn commented 8 years ago

@max-zilla also, I think @robkooper knows how to move the instance image from Nebula to Kilo

max-zilla commented 8 years ago

@yanliu-chn in that Dockerfile, instead of:

pip install git+https://opensource.ncsa.illinois.edu/stash/scm/cats/pyclowder.git

You should use:

pip install git+https://opensource.ncsa.illinois.edu/stash/scm/cats/pyclowder.git@bugfix/CATS-554-add-pyclowder-support-for-dataset

This will install the correct PyClowder branch.

For the correct extractor branch, this is in this repo now.

6 EXTRACTOR_HOME=/home/clowder/computing-pipeline/scripts/plantcv/extractor

54 git clone https://github.com/terraref/computing-pipeline.git

Not sure if this impacts your start.sh script too.

yanliu-chn commented 8 years ago

@max-zilla thanks for the update. it doesn't affect start.sh. I will find a time to update the Dockerfile and test.

max-zilla commented 8 years ago

@yanliu-chn can you add Python requests library to the Dockerfile dependencies it installs as well? I'm updating the extractor slightly to upload results to BETYdb: https://github.com/terraref/computing-pipeline/issues/33

max-zilla commented 8 years ago

@yanliu-chn I have completed my updates to the PlantCV extractor. If you don't have time to update the dockerfile, please let me know and I can try to do so.

yanliu-chn commented 8 years ago

@max-zilla That'd be great. I have been swamped recently.

max-zilla commented 8 years ago

@yanliu-chn your Dockerfile is here: https://opensource.ncsa.illinois.edu/bitbucket/projects/BD/repos/dockerfiles/browse/clowder/plantcv/Dockerfile (in the BD repo)

The Dockerfile I originally created lives with the extractor itself in the compute-pipeline repo: https://github.com/terraref/computing-pipeline/blob/master/scripts/plantcv/extractor/Dockerfile

Your copy is clearly the one to use. My question, probably for @robkooper, is whether we want the dockerfile for this extractor to go in the BD repo or in this one. I have not looked at the dockerfiles repo but it looks like that's the one to use going forward?

edit: Just saw this in the README: \ THIS REPOSITORY IS DEPRECATED, PLEASE SEE DOCKER FILES IN EACH REPOSITORY **

...so I'll move your dockerfile into this repository.

yanliu-chn commented 8 years ago

please feel free to merge the two into terraref repo. thanks!

max-zilla commented 8 years ago

@yanliu-chn the only piece I'm not sure how to update is this:

# Set plantcv env var
/bin/sed -i -e "s#exRootPath =.*#exRootPath = '${CLOWDER_HOME}/extractors-plantcv'#" config.py

(https://opensource.ncsa.illinois.edu/bitbucket/projects/BD/repos/dockerfiles/browse/clowder/plantcv/start.sh line 24)

Do you remember what this is looking for? The directory: https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-plantcv/browse ...doesn't really match what we have in computing-pipeline repo.

yanliu-chn commented 8 years ago

right. This points to the directory under which the extractor can find bin/extract.sh in the old plantcv. I don't think exRootPath is used in the new extractor Fengling developed. If so, we can drop it from config.py.

max-zilla commented 8 years ago

@yanliu-chn @nfahlgren I have moved the PlantCV extractor code to a new repository under this project: https://github.com/terraref/extractors-lemnatec-indoor

This is after discussion with @robkooper and @dlebauer. Reasons include:

Noah, I moved your indoor analysis script here. The Uploader/Globus_uploader scripts will remain in computing-pipeline/scripts/plantCV for now, because those are truly related to the pipeline and technically distinct from the extractor.

Yan, I believe I faithfully ported your Dockerfile and added the necessary updates, but may ask for your/Fengling's help in the next day or two to test.

yanliu-chn commented 8 years ago

Thanks, @max-zilla !

max-zilla commented 8 years ago

@yanliu-chn @robkooper sounds like it shouldn't be too difficult to move VM from Nebula to Roger now that both are Kilo.

We should just migrate this VM, and perform any necessary updates based on changes to plantCV since the test VM was set up (there may not be any).

max-zilla commented 8 years ago

I've just downloaded an 8 GB image of the Nebula PlantCV VM and I'm in the process of uploading it to Roger images. For those interested @yanliu-chn @robkooper the basic process should be generic for any OpenStack Kilo instances.

pip install python-openstackclient
openstack image list
glance --os-username USERNAME --os-tenant-id TENANT_ID \
--os-project-name TERRA --os-password PASSWD \
image-download --file plantcv.raw IMAGE_ID
max-zilla commented 8 years ago

OK @jdmaloney I think the VM is ready. Can you prepare exports for VM 141.142.170.130:

reads: /sites/danforth/raw_data writes: /sites/danforth/Level_1/plantcv

(there's a /danforth/derived_data but we can delete that and use Level_1 instead)

Right now obviously both these directories are empty, but once mounted I can try to get some data moved in there.

max-zilla commented 8 years ago

Extractor has been deployed. The BETYdb push flag is disabled until @gsrohde and I confirm the BETY instance to use.

Will see about getting the older data Noah had pushed to the dev instance loaded on production so it can trigger on them. Sample output: https://terraref.ncsa.illinois.edu/clowder/datasets/57ec256f4f0c067b0da81887

dlebauer commented 8 years ago

The instance to use is terraref.ncsa.illinois.edu/bety aka bety6 On Thu, Sep 29, 2016 at 8:32 AM Max Burnette notifications@github.com wrote:

Extractor has been deployed. The BETYdb push flag is disabled until @gsrohde https://github.com/gsrohde and I confirm the BETY instance to use.

Will see about getting the older data Noah had pushed to the dev instance loaded on production so it can trigger on them. Sample output: https://terraref.ncsa.illinois.edu/clowder/datasets/57ec256f4f0c067b0da81887

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/147#issuecomment-250466792, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX55ojy3PoBOmS3mP8erckJPHsd2p2ks5qu74CgaJpZM4JlwgU .

max-zilla commented 8 years ago

@gsrohde I believe that's the instance you mentioned yesterday. Can you please verify the necessary data has been loaded in that instance like we discussed here: https://github.com/terraref/computing-pipeline/issues/33 ?

Then I can add that URL, get an API key and make sure the module is functioning as expected.

gsrohde commented 8 years ago

@max-zilla Was there a particular comment in #33 where we discussed this? I think from David's comment https://github.com/terraref/computing-pipeline/issues/33#issuecomment-230545333 that all the metadata is there.

I think that the API rolls everything back if there is a row it can't insert. I have to double-check this. In other words, if you tried to do the API call without all the metadata being there, no harm would be done.

I still have to deploy some last few changes to the insertion API having to do with the traits issue we discussed yesterday. I plan to do a release today (unlikely), tomorrow (possibly) or early next week, after which we should be good to go.

max-zilla commented 8 years ago

@gsrohde looks like you're correct, the discussion I saw was relating to a test instance:

You are of course free (and encouraged) to post to any test copy. (You will have to put the relevant metadata in place first for the post to succeed.)
...
So shall I do this? Perhaps set it up on pecandev? All that's really needed is to dump the rows that will be referred to in the CSV file(s).
...
@max-zilla I think I've copied enough data over to pecandev so you can use it to test. The URL is http://pecandev.igb.illinois.edu/beta/api/beta/traits.csv.

But it looks like we're set. I'm going to close this issue and create another smaller one for the Bety push component.

e: instead of making a new issue, I'll reference in https://github.com/terraref/computing-pipeline/issues/172.