terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
21 stars 13 forks source link

Create full field stitched mosaic #85

Closed dlebauer closed 7 years ago

dlebauer commented 8 years ago

For purposes of QAQC, a rough and imperfectly aligned mosaic of the full field.

dlebauer commented 8 years ago

Is this solved by #96?

dlebauer commented 8 years ago

Next step: add to pipeline. Comments from @max-zilla

@abby621 @pless @robkooper it looks like it should be pretty straightforward to turn this into a Clowder extractor using the PyClowder library: https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder/browse Take a look under sample-extractors/wordcount.py for a simple extractor that counts the words in a text document. It has 2 primary pieces (ignoring Docker stuff):

config.py (extractorName and messageType are of interest) wordcount.py (the code itself, which here would be a modification of your demosaic.py script) Generally speaking extractors listen for certain files or datasets to trigger scripts to process them. In your script's case it looks like 3 files are required: 123_metadata.json, 123_left.bin, 123_right.bin. So we'd want to listen on a dataset - i.e. in config.py messageType = "*.dataset".

The py script should use PyClowder to simplify things. If you import it import pyclowder.extractors as extractors you only need to implement a couple functions:

check_message() will be called when "new file!" notification message is received by your extractor, before downloading the dataset - if it returns True, the process_file() function will be called. If False, nothing more will happen. So here, you'd want to check if all 3 of those files are present in the dataset. process_file() will then be called after verifying that the dataset should be processed. Here you'd basically then call your get_image_shape() and process_image() methods. Many of the examples regarding extractors talk about running on an individual file as opposed to a dataset, so the instructions might not be perfect. But this provides a starting point when it's time to implement this.

dlebauer commented 8 years ago

@abby621 does this make sense? You can reach @max-zilla here or on the terraref computing pipeline chat https://gitter.im/terraref/computing-pipeline if you have questions or need help

abby621 commented 8 years ago

This makes sense to me and it should be straightforward to convert the demosaic script to a Clowder extractor. That demosaic script, however, seems to be more related to 64, if I'm not mistaken? It converts the binary files to jpgs at present, and it shouldn't be too difficult to change that to incorporate the field of view and gps information from the json metadata to save the files out as geotiffs + jpg thumbnails if that's the preferred file format.

As far as the task of a full field stitched mosaic goes, we have a functional rough mosaicing script that currently has a couple "magic" numbers to adjust for the issue of the stereo camera not capturing at exactly the same time. Has that issue been resolved?

max-zilla commented 8 years ago

@abby621 I'm actually going to add a few things to PyClowder that might make this easier for you - I've created a branch here: https://opensource.ncsa.illinois.edu/jira/browse/CATS-554

Our extractor can listen on *.dataset.# (the add_file bit isn't necessary) and I'll make a small utility function in pyclowder that you'll use to fetch the files in the dataset if it's a stereoTop dataset - you shouldn't need to unzip them.

max-zilla commented 8 years ago

@abby621 OK, so I ended up writing some new stuff and hope to make a pull request tomorrow.

On processing the dataset, you'll receive a list of files (NOT a zip file) that you can feed directly into your process. I expect this will be pretty straightforward now.

So in total:

max-zilla commented 8 years ago

While this is not a full extractor yet, I created an example stereoTop dataset on our development instance of Clowder and ran the demosaic script on it so we can show as an example this week: http://141.142.209.122/clowder/datasets/574d9b4fe4b0efbe2dc4bb88

You can see the output images alongside the original .bin files.

dlebauer commented 8 years ago

@max-zilla is the demosaic script in this repository?

max-zilla commented 8 years ago

@dlebauer looks like you never merged @abby621's pull request for it: https://github.com/terraref/computing-pipeline/pull/96

That's where I got it from.

dlebauer commented 8 years ago

@max-zilla merged #96 ... who is responsible for adding this to the pipeline as an extractor?

abby621 commented 8 years ago

@dlebauer @max-zilla I believe that's on our lab. We prioritized the one off full field stitched mosaic implemented with map tiles for last week's meeting. I did, however, have a call with Max prior to that and have directions on how to move forward on the extractor.

ghost commented 8 years ago

@abby621 can you please let us know how this is going and if you need anything from the NCSA team?

pless commented 8 years ago

@rachelshekar @abby621 @dlebauer We have a new person who will take this project on. But, creating the full field stitched mosaic has requires that the extractor wait until images for a full scan are complete. Currently, the approach for clowder of listening for the creation of a file does not easily translate to listening for a complete scan to be complete. Perhaps there are other "listener" protocols? Alternatively, would it be possible to create a "scan complete" file that includes meta-data about a scan (as opposed to about a particular image?). This could be a text file with folder names that were taken within one scan, or at least start and end times of the scan.

An additional concern: the current approach to creating the full field visualization mosaic creates a web-based map and a large set of tiles (at different resolutions --- exactly like Google Map tiles). The reason for this is that the fully stitched file at full resolution would be absurdly large, and this google map approach provided the most convenient approach to making it easy to see the data. Potential problems are:

  1. It is computational intensive to run ... the visualization shown at the June 2 meeting took ~10 hours on a desktop computer (this includes converting bin --> jpg --> geoTiff, then integrating the geoTIFF files and creating the multi-resolution tiles.
  2. The multi-resolution tiles offer the ability to create the web view to visualize the large dataset (but collectively they are still lots of data).

Summary:

  1. We need some approach for clowder to trigger after a scan is complete,
  2. This is computationally intensive and data storage intensive. If that is going to be a problem looking forward we should discuss/explore modified approaches.
robkooper commented 8 years ago

Probably the better way to do this is have a user trigger this when they want it. So we can have the code ready to run but trigger a run of the extractor only if a user has shown interest in this. At that point we can disable the button until the process is complete.

Once the extractor is done it can associate the result as a preview with the dataset. Is it possible to zip up the result into a large zip file? This way we have a single zip file that contains the pyramid. All that is needed at that point is an endpoint in clowder that can be used to visualize this pyramid (maybe have it be application/x-image-pyramid as mimetype).

ZongyangLi commented 8 years ago

As we have created demosaic extractor to generate geotif which contains of geographic information, it should be straightforward to create an extractor to generate full stitched map. Primary script is available, but we still have something uncertain:

  1. As @pless discussed before, we need some approach to trigger after a scan is complete. For an initial version of the extractor, we input a "tif_list.txt"(contain absolute path of each geotiff) to tell the extractor which geotiff should be added into the full map, and this "tif_list.txt" input is not generated automatically. Is there a recommended way of doing this @robkooper ?
  2. This extractor will generate a folder that contains a large number of organized tiles. Shall we upload all these files to Clowder Service as a zip package?
  3. For these to be reviewed codes, should I create a pull request or upload it to somewhere else? @dlebauer Sorry for being not familiar with GitHub.
max-zilla commented 8 years ago

@pless and @ZongyangLi have a script that will take a list of GeoTIFFs and generate a full-field stitched mosaic.

Now the question becomes how to trigger this in Clowder, and on what files. @robkooper had originally mentioned running this extractor manually, but Pless also thought it would be nice to have a low-res full-field mosaic per day automatically.

Clowder would need to support extractors that run across many datasets representing all the images for a particular day somehow.

dlebauer commented 8 years ago

I vote for 'extractor triggers on a sensor x date collection'. The mimimum number of images should be 1 - it is necessary to differentiate between 0 and 1 (and n) for the purposes of QAQC.

I vote against exceptions 'outside of Clowders traditional architecture' because this will be a common use case, and being able to group any arbitrary collection of data that an extractor requires seems logical. At the same time, the algorithm itself (as opposed to the extractor) should only need to know where the inputs are, not how Clowder organizes them.

robkooper commented 8 years ago

We just want to make sure we don't trigger it everytime a new file is added since we then do a lot of unnecessary computations. So maybe every night at midnight we trigger it for that day.

pless commented 7 years ago

Status update, Sept. 29, 2016.

Some context on the current status of this issue:

This issue considers the problem of creating a full field mosaic. We are choosing to create the full field mosaic as a "Google Map" style visualization, so that we don't end up making the 100GB high resolution full field image. The end to end process for this has 3 parts:

  1. Convert .bin files into geoTIFFS and jpgs; this is an extractor that @ZongyangLi wrote and @max-zilla modified/integrated as an extractor, and this is currently running.
  2. A second "extractor" that converts geoTIFFS into the "tiles" that make up the Google map version. This extractor is unusual because it needs to operate of a whole collection of files (all the geoTIFFS) instead of one file at a time; this is a feature that @max-zilla is taking on his plate to add to clowder.
  3. A web page that implements the google map interface that reads the tiles as needed to show the visualization.

Things that I think remain on this are:

a. Deploying the extractor for #2 which is something that I think is already on Max's plate, and b. Working to create the webpage that would show 1 possible field map, or, even better, would be able to show a time series of the field map. This is something that we will take on here at Wash. U.

dlebauer commented 7 years ago

@pless would NASA's WorldView provide an alternative solution? It looks easy to deploy, and it provides a web-based interface to data that automatically does the tiling and also allows users to interactively view, select, and download data products. I gather that this would allow us to generate one big full field dataset that can be subset based on user needs. It is open source and the code is here: https://github.com/nasa-gibs/worldview

For example, you can view (and share, download, and programmatically access!) a time series of global land surface temperature

@yanliu-chn pointed out that the back-end mapping server Global Imagery Browse Service (GIBS) is the key component.

robkooper commented 7 years ago

@pless This seems a lot like zoomable images. We currently have this task as https://opensource.ncsa.illinois.edu/jira/browse/CATS-671 for clowder and I have some ideas on how to store the data. I'd love to chat and see if we can store it in such a way that we can show them in clowder or in google maps using superoverlays https://developers.google.com/kml/documentation/regions?csw=1#superoverlays

ZongyangLi commented 7 years ago

@dlebauer @robkooper Thanks! I think you are providing great ideas for visualizing the stitched map, I will check it asap.

At the same time, I would like to describe the way we are using. Here is an example of one day’s tiles created by our local extractor, stored with a ‘google map structure’ in my laptop.

And we are currently using this html web page to show the visualization locally. To show this tiles, you can easily modify the source tiles path in the html file. And I think it will be strait forward to deploy for other users by providing a public web link.

I’m working on a web service example so that every could view this google map.

Here is a demo video of this google map.

ghost commented 7 years ago

Leaf Area Index (actually “Canopy Cover Percentage”) extractor visualization shows automatically extracted CANOPY COVER RATIO from the first planting in May 2016.

http://www.cs.wustl.edu/~pless/TERRA/page.html

ZongyangLi commented 7 years ago

full field stitch examples can be accessed on roger in the space of: /projects/arpae/terraref/users/zongyang/fullFieldStitch/ @max-zilla @dlebauer Could you access into my working space?

max-zilla commented 7 years ago

@ZongyangLi looks like I can access that directory

ZongyangLi commented 7 years ago

@max-zilla Okay, for now I just created full map for the date of 2016-09-29, tile files is in the directory of tiles_2016-09-29, web page file is opengooglemaps.html.

Do I need to create full map for other dates?

max-zilla commented 7 years ago

@ZongyangLi I'll leave that up to you - if it's easy go ahead, but one good example is sufficient for what I had in mind - @dlebauer ?

ZongyangLi commented 7 years ago

Yes it's easy for me to create more full maps on roger. But it will take a lot of storage space and computing resources.

dlebauer commented 7 years ago

@ZongyangLi How much storage space? I don't think it is necessary for each day, but it would be great to have a few - perhaps every few weeks?

ZongyangLi commented 7 years ago

@dlebauer No more than 30 GB for one day. If that's Okay, I will find some full scanned dates to create maps for season 2.

dlebauer commented 7 years ago

@ZongyangLi Go for it!

pless commented 7 years ago

The full field stitched mosaic requires the most storage space for the very highest resolution images. Perhaps you can make a medium resolution mosaic for 10% of the space (but if I recall how the geoTIFF creation works, I think this would take the same amount of computation).

ghost commented 7 years ago

@jterstriep - has this been deployed?

ghost commented 7 years ago

Need #209 to be completed first

max-zilla commented 7 years ago

209 can extend from this, but I intend to deploy this extractor even before 209 is completed. This week I updated the extractor for HPC environment based on discussions with @jterstriep and Rob has to fix the terra-clowder VM because some hardware failed yesterday.

ghost commented 7 years ago

@max-zilla - please update

max-zilla commented 7 years ago

@ZongyangLi have you shared the mosaic script you used to generate the examples on /projects/arpae/terraref/users/zongyang/fullFieldStitch/? I'd like to add it as extractor to https://github.com/terraref/extractors-stereo-rgb so I can get ready to deploy.

Apologies if you've shared before and I forgot.

ZongyangLi commented 7 years ago

@max-zilla https://github.com/terraref/computing-pipeline/tree/demosaic_extractor/scripts/stereoImager Here is the script I uploaded before. I used 'full_day_to_tiles.py' to generate the examples on Roger. 'terra_full_stitched_tiles.py' is a script that supposed to run as an clowder extractor, looking forward a list of geotiff file as input.

please let me know if there is anything I can do.

max-zilla commented 7 years ago

@ZongyangLi added it in. https://github.com/terraref/extractors-stereo-rgb/pull/5

After talking with @robkooper I think in the short term we aren't going to run this as a Clowder extractor because we still need to implement collection-level extractors.

Instead I've made some slight modifications to your code and deployed it directly on ROGER so I can run it on there across dates. I'm installing dependencies now but should be close to getting this going on last summer's data.

max-zilla commented 7 years ago

heat map of plots by sensors - how many are inserting information, max/min, etc. @max-zilla create a separate issue for this.

in meantime I will get a cron job going with @jterstriep to help if necessary so we can generate these mosaics for stereoTop. then will close this.