terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
23 stars 13 forks source link

Solution for creation of new data products that depend on multiple inputs #248

Closed craig-willis closed 7 years ago

craig-willis commented 7 years ago

As discussed in Spring 2017 planning meeting, the current Clowder extractor architecture does not support the creation of data products that rely on multiple sensor inputs, specifically the sensor fusion output or Roman's machine learning models.

We've discussed a few options:

Completion Criteria

ghost commented 7 years ago

@dlebauer - should the the sensor fusion meeting be separate from the machine learning model meeting?

Who is doing sensor fusion besides Solmaz?

ghost commented 7 years ago

David, Max, Craig, Rob and Jeff will meet to talk about this.

dlebauer commented 7 years ago

@craig-willis @max-zilla could you please summarize conclusions from the last meeting and identify next steps?

max-zilla commented 7 years ago

@dlebauer we have not had a formal meeting on this specific topic yet.

The approach I am anticipating is the second one: External process running on Roger (cron task)

...but only using this for complex modules that require e.g. an entire day of data, or several sensors' worth of data. In those cases we would be working against Clowder's capabilities to try and squeeze that functionality into existing framework - but this does not compel me to want to create ALL extractors as cron jobs like this. Having on-demand triggering for high-dataset-volume sensors like stereoTop is more efficient than 20 different threads of bulk processing we trigger on the clock throughout the day, and it preserves the ability for us to allow users to run or re-run extractors on demand without needing someone "on the inside" to alter job schedules and whatnot.

Still worth a discussion with @craig-willis , @jterstriep , @robkooper about the best way to approach this.

Current candidates for this "Cron Pipeline" flavor:

Based on discussions with @solmazhajmohammadi I'm inclined to accomplish the plot-level clipping first and make our lives easier on the sensor fusion. In issue #265 this will be happening and I'm going to push us to get at least one sample day clipped into plots for each sensor as a starting target.

max-zilla commented 7 years ago

I am going to write a new rule-checker extractor that will delegate incoming files to appropriate extractors based on whatever rules apply to each extractor. The delegates can be normal extractors, these complex scripts, etc. Rules can be more comprehensive than typical Clowder extraction pipeline. https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-rulechecker/browse

max-zilla commented 7 years ago

This extractor is written and basically ready - once we have our stitching & clipping script ready from #265 we will use this to trigger those scripts on full days of data.