terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
21 stars 13 forks source link

Complete PlantCV extractor #119

Closed max-zilla closed 7 years ago

max-zilla commented 8 years ago

This is a continuation of this discussion: https://github.com/terraref/computing-pipeline/pull/108

Current plan:

This diagram is intended to illustrate: image

max-zilla commented 8 years ago

@fwang29 and I tested out her extractor code today, my branch is here: https://github.com/terraref/computing-pipeline/tree/plantcv-clowder-extractor

Should be ready to try deploying at this point.

This requires my PyClowder branch that includes dataset message handling. https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder/browse?at=refs%2Fheads%2Fbugfix%2FCATS-554-add-pyclowder-support-for-dataset

max-zilla commented 7 years ago

@fwang29 has a version of this that includes @nfahlgren's average traits.

this morning I am looking at the small # of changes we made to PlantcvIndoorAnalysis. Noah, I'm going to propose a change to your script based on these - primarily what this will do is move some command-line specific components (such as fetching files from clowder with GET and POSTing results) outside the process_sv_images() so we get generic process functions that can be called from extractor and your command line script both. after that, if the outputs look good i think V1 of the extractor is ready.

the idea is that we end up with a script that works for both cases, but is also basically the same as what you wrote so you can update the actual base logic as necessary.

nfahlgren commented 7 years ago

@max-zilla sounds good to me.

max-zilla commented 7 years ago

@nfahlgren @yanliu-chn OK, just pushed a big-ish update. @fwang29 and I just tested this both by:

Both paths generate the same outputs:

If you wanted to schedule a 20 minute call to go over changes, I'm happy to do so. None of the logic in your code has changed:

Only question is, previously the extractor was adding .TAB files to the dataset for each pair of NIR/VIS image. Following your script, Fengling now just writes that content to the file metadata of the respective images and the only new file is average_traits.csv. is this acceptable or should the TAB files be kept?

Otherwise testing seems fine. Noah let me know if you want to discuss - otherwise if you are willing to make sure it still works the same from command line I think this can be ready as a version 1.

The main idea here was to make it easy for you Noah to update the actual PlantCV logic as needed in your script and the extractor pulls all relevant logic from that rather than copying any code or replicating behavior.

Actual extractor code was Fengling's work, I just reorganized things a bit.

nfahlgren commented 7 years ago

@max-zilla @yanliu-chn @fwang29 Looks good, thanks for getting this up and running. The only little thing I see is line 103 in the extractor, which looks like the list is hard-coded for 10 images. What if this lines was something like:

for i in [x for x in range(0, len(file_objs)) if x % 2 == 0]:
max-zilla commented 7 years ago

@nfahlgren I like that, an elegant solution.

That reminded me of one last thing I wanted to do, which is improve the check_message() logic in the extractor - this is the piece that flags a "file added to dataset" message as relevant or not. I just pushed a small update:

i might ask @robkooper to skim this if he has time but otherwise think we're ready.

max-zilla commented 7 years ago

Also created a pull request: https://github.com/terraref/computing-pipeline/pull/131

yanliu-chn commented 7 years ago

Sorry that I was on two consecutive business trips that were intense.

I'm trying to deploy the extractor on Nebular. Looks like the newest plantcv extractor is in the master branch. Two more questions:

Thanks!

nfahlgren commented 7 years ago

@yanliu-chn I think any opencv version in the 2.4.x branch will work. I know that currently the 3.x branch will not work.

yanliu-chn commented 7 years ago

@nfahlgren thanks!

max-zilla commented 7 years ago

@yanliu-chn yes that is the correct PyClowder branch to use.

yanliu-chn commented 7 years ago

@max-zilla @fwang29 did you test the extractor using @nfahlgren upload script? could you provide a test snapshot raw data with the json metadata and SnapshotInfo.csv in it? Noah once told me the location of one. But I couldn't find it on github issues.

fwang29 commented 7 years ago

https://drive.google.com/file/d/0B-_VMDWHLIgCemwtUl9QTnN2OW8/view Yes. And this link should have what you need.

On Tue, Jul 26, 2016 at 3:07 PM, Yan Liu notifications@github.com wrote:

@max-zilla https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_max-2Dzilla&d=CwMCaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=YbtmrzRWo9y25Q8hbbbAtU_T5OOd8LcaX5KmxWJk-iE&m=R8xtjLsnjYPSRKcZiGhSA0-3MRNS3dK1gO0PPUqbhGI&s=L-2_BH8jgnsMgTm3qO5rMLtNxbjKXwqHJ1IeNdUnhJo&e= @fwang29 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_fwang29&d=CwMCaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=YbtmrzRWo9y25Q8hbbbAtU_T5OOd8LcaX5KmxWJk-iE&m=R8xtjLsnjYPSRKcZiGhSA0-3MRNS3dK1gO0PPUqbhGI&s=VOintiXumT0-OsXC4e78_Jdx9tSe-Q6PNcCras0uAUY&e= did you test the extractor using @nfahlgren https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nfahlgren&d=CwMCaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=YbtmrzRWo9y25Q8hbbbAtU_T5OOd8LcaX5KmxWJk-iE&m=R8xtjLsnjYPSRKcZiGhSA0-3MRNS3dK1gO0PPUqbhGI&s=80GJ52Gu3nSlGLwlpeQjoFm-GkIm4uyet1PZwV9Auhc&e= upload script? could you provide a test snapshot raw data with the json metadata and SnapshotInfo.csv in it? Noah once told me the location of one. But I couldn't find it on github issues.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_terraref_computing-2Dpipeline_issues_119-23issuecomment-2D235388582&d=CwMCaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=YbtmrzRWo9y25Q8hbbbAtU_T5OOd8LcaX5KmxWJk-iE&m=R8xtjLsnjYPSRKcZiGhSA0-3MRNS3dK1gO0PPUqbhGI&s=2UJwdLBB7vigjTOmhL8_v5a5d2hz8XsGpOali2SysXQ&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AIbeq2CRyuWE00Q-2Dfwzo9WQHKEZimJvMks5qZmj4gaJpZM4I9AAe&d=CwMCaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=YbtmrzRWo9y25Q8hbbbAtU_T5OOd8LcaX5KmxWJk-iE&m=R8xtjLsnjYPSRKcZiGhSA0-3MRNS3dK1gO0PPUqbhGI&s=ClNL4uKHoQyH8lOJ-MH5--OHNgBqX3m8qk-N2JDm68U&e= .

Fengling Wang B.S. Computer Engineering University of Illinois at Urbana-Champaign, 2017 (217)418-4971 | fwang29@illinois.edu http://kedamonobaby.co.nf

yanliu-chn commented 7 years ago

I have deployed the extractor on Nebula . Refer to #68

max-zilla commented 7 years ago

I am going to close this issue. The BETYdb piece remains: https://github.com/terraref/computing-pipeline/issues/33

And getting Globus pipeline from Danforth in place: https://github.com/terraref/computing-pipeline/issues/136