Closed dlebauer closed 7 years ago
Is this solved by #96?
Next step: add to pipeline. Comments from @max-zilla
@abby621 @pless @robkooper it looks like it should be pretty straightforward to turn this into a Clowder extractor using the PyClowder library: https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder/browse Take a look under sample-extractors/wordcount.py for a simple extractor that counts the words in a text document. It has 2 primary pieces (ignoring Docker stuff):
config.py (extractorName and messageType are of interest) wordcount.py (the code itself, which here would be a modification of your demosaic.py script) Generally speaking extractors listen for certain files or datasets to trigger scripts to process them. In your script's case it looks like 3 files are required: 123_metadata.json, 123_left.bin, 123_right.bin. So we'd want to listen on a dataset - i.e. in config.py messageType = "*.dataset".
The py script should use PyClowder to simplify things. If you import it import pyclowder.extractors as extractors you only need to implement a couple functions:
check_message() will be called when "new file!" notification message is received by your extractor, before downloading the dataset - if it returns True, the process_file() function will be called. If False, nothing more will happen. So here, you'd want to check if all 3 of those files are present in the dataset. process_file() will then be called after verifying that the dataset should be processed. Here you'd basically then call your get_image_shape() and process_image() methods. Many of the examples regarding extractors talk about running on an individual file as opposed to a dataset, so the instructions might not be perfect. But this provides a starting point when it's time to implement this.
@abby621 does this make sense? You can reach @max-zilla here or on the terraref computing pipeline chat https://gitter.im/terraref/computing-pipeline if you have questions or need help
This makes sense to me and it should be straightforward to convert the demosaic script to a Clowder extractor. That demosaic script, however, seems to be more related to 64, if I'm not mistaken? It converts the binary files to jpgs at present, and it shouldn't be too difficult to change that to incorporate the field of view and gps information from the json metadata to save the files out as geotiffs + jpg thumbnails if that's the preferred file format.
As far as the task of a full field stitched mosaic goes, we have a functional rough mosaicing script that currently has a couple "magic" numbers to adjust for the issue of the stereo camera not capturing at exactly the same time. Has that issue been resolved?
@abby621 I'm actually going to add a few things to PyClowder that might make this easier for you - I've created a branch here: https://opensource.ncsa.illinois.edu/jira/browse/CATS-554
Our extractor can listen on *.dataset.# (the add_file bit isn't necessary) and I'll make a small utility function in pyclowder that you'll use to fetch the files in the dataset if it's a stereoTop dataset - you shouldn't need to unzip them.
@abby621 OK, so I ended up writing some new stuff and hope to make a pull request tomorrow.
On processing the dataset, you'll receive a list of files (NOT a zip file) that you can feed directly into your process. I expect this will be pretty straightforward now.
So in total:
While this is not a full extractor yet, I created an example stereoTop dataset on our development instance of Clowder and ran the demosaic script on it so we can show as an example this week: http://141.142.209.122/clowder/datasets/574d9b4fe4b0efbe2dc4bb88
You can see the output images alongside the original .bin files.
@max-zilla is the demosaic script in this repository?
@dlebauer looks like you never merged @abby621's pull request for it: https://github.com/terraref/computing-pipeline/pull/96
That's where I got it from.
@max-zilla merged #96 ... who is responsible for adding this to the pipeline as an extractor?
@dlebauer @max-zilla I believe that's on our lab. We prioritized the one off full field stitched mosaic implemented with map tiles for last week's meeting. I did, however, have a call with Max prior to that and have directions on how to move forward on the extractor.
@abby621 can you please let us know how this is going and if you need anything from the NCSA team?
@rachelshekar @abby621 @dlebauer We have a new person who will take this project on. But, creating the full field stitched mosaic has requires that the extractor wait until images for a full scan are complete. Currently, the approach for clowder of listening for the creation of a file does not easily translate to listening for a complete scan to be complete. Perhaps there are other "listener" protocols? Alternatively, would it be possible to create a "scan complete" file that includes meta-data about a scan (as opposed to about a particular image?). This could be a text file with folder names that were taken within one scan, or at least start and end times of the scan.
An additional concern: the current approach to creating the full field visualization mosaic creates a web-based map and a large set of tiles (at different resolutions --- exactly like Google Map tiles). The reason for this is that the fully stitched file at full resolution would be absurdly large, and this google map approach provided the most convenient approach to making it easy to see the data. Potential problems are:
Summary:
Probably the better way to do this is have a user trigger this when they want it. So we can have the code ready to run but trigger a run of the extractor only if a user has shown interest in this. At that point we can disable the button until the process is complete.
Once the extractor is done it can associate the result as a preview with the dataset. Is it possible to zip up the result into a large zip file? This way we have a single zip file that contains the pyramid. All that is needed at that point is an endpoint in clowder that can be used to visualize this pyramid (maybe have it be application/x-image-pyramid as mimetype).
As we have created demosaic extractor to generate geotif which contains of geographic information, it should be straightforward to create an extractor to generate full stitched map. Primary script is available, but we still have something uncertain:
@pless and @ZongyangLi have a script that will take a list of GeoTIFFs and generate a full-field stitched mosaic.
Now the question becomes how to trigger this in Clowder, and on what files. @robkooper had originally mentioned running this extractor manually, but Pless also thought it would be nice to have a low-res full-field mosaic per day automatically.
Clowder would need to support extractors that run across many datasets representing all the images for a particular day somehow.
I vote for 'extractor triggers on a sensor x date collection'. The mimimum number of images should be 1 - it is necessary to differentiate between 0 and 1 (and n) for the purposes of QAQC.
I vote against exceptions 'outside of Clowders traditional architecture' because this will be a common use case, and being able to group any arbitrary collection of data that an extractor requires seems logical. At the same time, the algorithm itself (as opposed to the extractor) should only need to know where the inputs are, not how Clowder organizes them.
We just want to make sure we don't trigger it everytime a new file is added since we then do a lot of unnecessary computations. So maybe every night at midnight we trigger it for that day.
Status update, Sept. 29, 2016.
Some context on the current status of this issue:
This issue considers the problem of creating a full field mosaic. We are choosing to create the full field mosaic as a "Google Map" style visualization, so that we don't end up making the 100GB high resolution full field image. The end to end process for this has 3 parts:
Things that I think remain on this are:
a. Deploying the extractor for #2 which is something that I think is already on Max's plate, and b. Working to create the webpage that would show 1 possible field map, or, even better, would be able to show a time series of the field map. This is something that we will take on here at Wash. U.
@pless would NASA's WorldView provide an alternative solution? It looks easy to deploy, and it provides a web-based interface to data that automatically does the tiling and also allows users to interactively view, select, and download data products. I gather that this would allow us to generate one big full field dataset that can be subset based on user needs. It is open source and the code is here: https://github.com/nasa-gibs/worldview
For example, you can view (and share, download, and programmatically access!) a time series of global land surface temperature
@yanliu-chn pointed out that the back-end mapping server Global Imagery Browse Service (GIBS) is the key component.
@pless This seems a lot like zoomable images. We currently have this task as https://opensource.ncsa.illinois.edu/jira/browse/CATS-671 for clowder and I have some ideas on how to store the data. I'd love to chat and see if we can store it in such a way that we can show them in clowder or in google maps using superoverlays https://developers.google.com/kml/documentation/regions?csw=1#superoverlays
@dlebauer @robkooper Thanks! I think you are providing great ideas for visualizing the stitched map, I will check it asap.
At the same time, I would like to describe the way we are using. Here is an example of one day’s tiles created by our local extractor, stored with a ‘google map structure’ in my laptop.
And we are currently using this html web page to show the visualization locally. To show this tiles, you can easily modify the source tiles path in the html file. And I think it will be strait forward to deploy for other users by providing a public web link.
I’m working on a web service example so that every could view this google map.
Here is a demo video of this google map.
Leaf Area Index (actually “Canopy Cover Percentage”) extractor visualization shows automatically extracted CANOPY COVER RATIO from the first planting in May 2016.
full field stitch examples can be accessed on roger in the space of:
/projects/arpae/terraref/users/zongyang/fullFieldStitch/
@max-zilla @dlebauer Could you access into my working space?
@ZongyangLi looks like I can access that directory
@max-zilla Okay, for now I just created full map for the date of 2016-09-29, tile files is in the directory of tiles_2016-09-29
, web page file is opengooglemaps.html
.
Do I need to create full map for other dates?
@ZongyangLi I'll leave that up to you - if it's easy go ahead, but one good example is sufficient for what I had in mind - @dlebauer ?
Yes it's easy for me to create more full maps on roger. But it will take a lot of storage space and computing resources.
@ZongyangLi How much storage space? I don't think it is necessary for each day, but it would be great to have a few - perhaps every few weeks?
@dlebauer No more than 30 GB for one day. If that's Okay, I will find some full scanned dates to create maps for season 2.
@ZongyangLi Go for it!
The full field stitched mosaic requires the most storage space for the very highest resolution images. Perhaps you can make a medium resolution mosaic for 10% of the space (but if I recall how the geoTIFF creation works, I think this would take the same amount of computation).
@jterstriep - has this been deployed?
Need #209 to be completed first
@max-zilla - please update
@ZongyangLi have you shared the mosaic script you used to generate the examples on /projects/arpae/terraref/users/zongyang/fullFieldStitch/
? I'd like to add it as extractor to
https://github.com/terraref/extractors-stereo-rgb so I can get ready to deploy.
Apologies if you've shared before and I forgot.
@max-zilla https://github.com/terraref/computing-pipeline/tree/demosaic_extractor/scripts/stereoImager Here is the script I uploaded before. I used 'full_day_to_tiles.py' to generate the examples on Roger. 'terra_full_stitched_tiles.py' is a script that supposed to run as an clowder extractor, looking forward a list of geotiff file as input.
please let me know if there is anything I can do.
@ZongyangLi added it in. https://github.com/terraref/extractors-stereo-rgb/pull/5
After talking with @robkooper I think in the short term we aren't going to run this as a Clowder extractor because we still need to implement collection-level extractors.
Instead I've made some slight modifications to your code and deployed it directly on ROGER so I can run it on there across dates. I'm installing dependencies now but should be close to getting this going on last summer's data.
heat map of plots by sensors - how many are inserting information, max/min, etc. @max-zilla create a separate issue for this.
in meantime I will get a cron job going with @jterstriep to help if necessary so we can generate these mosaics for stereoTop. then will close this.
For purposes of QAQC, a rough and imperfectly aligned mosaic of the full field.