terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
24 stars 13 forks source link

Insert plot level height histogram into Clowder geostreams; height into BETYdb #210

Closed ZongyangLi closed 7 years ago

ZongyangLi commented 7 years ago

Description

We have scripts to generate plot level height histogram on Roger. The next step is to create a pipeline for this extractor.

Completion Criteria

dlebauer commented 7 years ago

@ZongyangLi @rmgarnett @pless

For this extractor, I would suggest that we write the summary stats (histogram) into the metadata and insert a few statistics into BETYdb. For example, we have inserted a trait called '95th quantile height'.

But the key trait from the point cloud is the height estimate calibrated to field measurements. This trait will have the same name as the trait that Maria measured, i.e. 'canopy_height'. I think it would make sense for this extractor to use the calibrated model that Roman developed in #175.

dlebauer commented 7 years ago

@rmgarnett what are the (slope, intercept) parameters from the model in #175?

When estimating height at the plot level, can we also estimate uncertainty?

rmgarnett commented 7 years ago

[hand height] = 28.2cm + 0.661 * [89th height percentile]

The RMSE/MAE gives a rough estimate of L2/L1 uncertainty. I will do a more thorough analysis in January now that all height distributions are extracted.

dlebauer commented 7 years ago

@rmgarnett I suspect RMSE scales with height?

From your plot it is hard to tell how the data are distributed b/c of overlapping points. But I gather strongly right-skewed. I wonder if log transforming x and y would be appropriae, if it would more evenly weight the smaller values. The small plants are important too!

img_2204

ZongyangLi commented 7 years ago

@dlebauer I have got all height distribution data for season 2 from 8/8 to 11/25, and I created 90th and 95th height percentile csv file, according to @rmgarnett 's research. 90th percentile 95th percentile Scanner3DTop data in Season 2 is much better than those in Season 1, but still data from 10/13 to 11/04 are unexpected, there are just a few points in those days ply files.

I am wondering if point cloud files might be fixed in those days, if not, what's your opinion of putting them into BETYdb.

dlebauer commented 7 years ago

@solmazhajmohammadi could you please check into whether we can recover useful data from 10/13 to 11/04?

@ZongyangLi we need to discuss with @rmgarnett about how to implement this extractor.

dlebauer commented 7 years ago

@rmgarnett have you made any progress on adding uncertainty?

rmgarnett commented 7 years ago

I will pick this up again this week.

On Wed, Jan 11, 2017 at 2:13 AM David LeBauer notifications@github.com wrote:

@rmgarnett https://github.com/rmgarnett have you made any progress on adding uncertainty?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/210#issuecomment-271668753, or mute the thread https://github.com/notifications/unsubscribe-auth/AAjpCbAL13MIg5xXZ0OKTTsr-lbYxIahks5rQ9hBgaJpZM4LIBt3 .

dlebauer commented 7 years ago

@ZongyangLi you can go ahead and insert the data that you have. We can create another issue for adding uncertainty to the height calculations (moving forward this should be done by default ... )

solmazhajmohammadi commented 7 years ago

@dlebauer @ZongyangLi, for the data from 10/13 to 11/04, the png files have not been collected correctly, but we can get the height information from the scans that it is done at ~5m

solmazhajmohammadi commented 7 years ago

@smarshall-bmr can you please scan the checker boards to find the pointcloud origin?

ZongyangLi commented 7 years ago

@solmazhajmohammadi, are you saying to estimate the plot level height base on the highest points in the remaining 3d data? That might be different from what we have done before, because we are using all point cloud data to create a height histogram and calculate quantiles data to make predictions.

solmazhajmohammadi commented 7 years ago

@ZongyangLi This could be an option, otherwise the data has been collected with a wrong setting, so we are not able to recover it.

rmgarnett commented 7 years ago

I have been reinvestigating the hand measurements using @ZongyangLi's most-recent data. The final model may differ from what's written above, but it will be the same form. I presume the extractor will be easy to modify if we wish to change the model slightly?

dlebauer commented 7 years ago

Yes, we could store parameters as metadata and have the extractor pick them up (eg if they change by crop, year, or location) On Fri, Jan 13, 2017 at 4:21 PM Roman Garnett notifications@github.com wrote:

I have been reinvestigating the hand measurements using @ZongyangLi https://github.com/ZongyangLi's most-recent data. The final model may differ from what's written above, but it will be the same form. I presume the extractor will be easy to modify if we wish to change the model slightly?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/210#issuecomment-272564064, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX5xs8Q3cOxCiNSQpFMcN6MlWYskaiks5rR_jygaJpZM4LIBt3 .

rmgarnett commented 7 years ago

Perfect.

ghost commented 7 years ago

@robkooper and @max-zilla - should this go into geostreams? clowder too? @ZongyangLi -what will be visualized?

solmazhajmohammadi commented 7 years ago

@rmgarnett @ZongyangLi PointCloud data from 2016/08/07 to 2016/09/05 was collected with a wrong setting. There is no way to fix this dataset. Maybe we can delete them or mark them to exclude from the pipeline. @dlebauer @max-zilla any idea?

dlebauer commented 7 years ago

Please don't remove them unless it is clear that they contain no useful information - i.e. all points have been randomly redistributed. We can keep them but exclude them from our workflow (even if the information is not useful within the current scope of the project, others may find it useful).

One possibility:

On Wed, Feb 8, 2017 at 11:02 AM Solmaz Hajmohammadi < notifications@github.com> wrote:

@rmgarnett https://github.com/rmgarnett @ZongyangLi https://github.com/ZongyangLi PointCloud data from 2016/08/07 to 2016/09/05 was collected with a wrong setting. There is no way to fix this dataset. Maybe we can delete them or mark them to exclude from the pipeline. @dlebauer https://github.com/dlebauer @max-zilla https://github.com/max-zilla any idea?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/210#issuecomment-278390593, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX5w1Q6lA3VinMXYbJufMSIO1x86ATks5rafU-gaJpZM4LIBt3 .

rmgarnett commented 7 years ago

I think that's reasonable. Unfortunately, I am not sure these points can be used to reliably estimate height, but they could be useful for some other purpose.

I could try to learn a separate model for this range and for the days afterwards, but I hesitate to do so.

max-zilla commented 7 years ago

@ZongyangLi @rmgarnett any updates on this issue?

ZongyangLi commented 7 years ago

I was worried about providing untrue data into Clowder geostream or BETYdb, so I asked for a transformation matrix to the gantry coordinate system in metadata to get a plot level height histogram. If these uncertain result can be insert into those database, @max-zilla could you send me an instruction of Clowder geostreams?

max-zilla commented 7 years ago

@ZongyangLi we can write data to geostreams that we can regenerate and replace later once we have corrections to the code, absolutely.

Here is a comment I left in another issue about Geostreams: https://github.com/terraref/computing-pipeline/issues/252#issuecomment-286189327

If you look at the links there, you can see an example of how I use it. The basic approach is:

1) determine which "plot" /sensor to use. I've already created a geostreams sensor entry for each plot, and you can query for the nearest one by lat/long with this: https://github.com/terraref/extractors-metadata/blob/use-pyclowder-geostreams/sensorposition/terra_sensorposition.py#L101

sensor_data = pyclowder.geostreams.get_sensors_by_circle(connector, host, secret_key, sensor_latlon[1], sensor_latlon[0], 0.01)

2) determine which stream to use. you'll want to create a new stream for your data within the plot. e.g. "Height Histogram - Range X Pass Y" (where Range X Pass Y is the name of returned in sensor_data above). Here is a code snippet where we can look for an existing stream with that name and create it if it doesn't exist: https://github.com/terraref/extractors-metadata/blob/use-pyclowder-geostreams/sensorposition/terra_sensorposition.py#L129

stream_data = pyclowder.geostreams.get_stream_by_name(connector, host, secret_key, stream_name)
        if not stream_data:
            stream_id = pyclowder.geostreams.create_stream(connector, host, secret_key, stream_name, sensor_id, {
                "type": "Point",
                "coordinates": [sensor_latlon[1], sensor_latlon[0], 0]
            })
        else: stream_id = stream_data['id']

3) Add datapoints to that stream ID. https://github.com/terraref/extractors-metadata/blob/use-pyclowder-geostreams/sensorposition/terra_sensorposition.py#L161 The "metadata" JSON properties object can have whatever you want - at least the height histogram in this case.

Take a look at the code and let me know if it kind of makes sense. If it's helpful, here's the pyclowder 2 geostreams source code: https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder2/browse/pyclowder/geostreams.py

ZongyangLi commented 7 years ago

@max-zilla Thanks a lot! I will take a look into it and start working on this.

ZongyangLi commented 7 years ago

@max-zilla It seems I have to use pyclowder 2. I do update my pyclowder in my laptop, but when I test the example, I got the following message: python wordcount.py 2017-03-15 15:01:13,756 [MainThread ] INFO : pika.adapters.base_connection - Connecting to 127.0.0.1:5672 2017-03-15 15:01:13,760 [MainThread ] INFO : pika.adapters.blocking_connection - Created channel=1 2017-03-15 15:01:13,877 [MainThread ] INFO : pyclowder.extractors - Waiting for messages. To exit press CTRL+C 2017-03-15 15:01:13,878 [Connector-0 ] INFO : pyclowder.connectors - Starting to listen for messages.sgsd 2017-03-15 15:02:50,139 [Thread-1 ] ERROR : pyclowder.connectors - Error in registering extractor: 400 Client Error: Bad Request for url: http://localhost:9000/api/extractors?key=r1ek3rs

Is there any step I missed?

max-zilla commented 7 years ago

@ZongyangLi the "error registering extractors" is not a big problem - registering just makes Clowder allow that extractor to be selected in manual extractor lists in the GUI. If you got "Waiting for messages" I think it's working properly.

you do want pyclowder 2, yes

ZongyangLi commented 7 years ago

@max-zilla Could you give me the definition of all the input arguments to the geostreams. Because it seems I need to build all the arguments myself, such as sensor_data stream_name geomand so on

max-zilla commented 7 years ago

@ZongyangLi that is included in the geostreams source code: https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder2/browse/pyclowder/geostreams.py

def create_sensor(connector, host, key, sensorname, geom, type, region):
    """Create a new sensor in Geostreams.

    Keyword arguments:
    connector -- connector information, used to get missing parameters and send status updates
    host -- the clowder host, including http and port, should end with a /
    key -- the secret key to login to clowder
    sensorname -- name of new sensor to create
    geom -- GeoJSON object of sensor geometry
    type -- JSON object with {"id", "title", and "sensorType"}
    region -- region of sensor
    """

def create_stream(connector, host, key, streamname, sensorid, geom, properties={}):
    """Create a new stream in Geostreams.

    Keyword arguments:
    connector -- connector information, used to get missing parameters and send status updates
    host -- the clowder host, including http and port, should end with a /
    key -- the secret key to login to clowder
    streamname -- name of new stream to create
    sensorid -- id of sensor to attach stream to
    geom -- GeoJSON object of sensor geometry
    properties -- JSON object with any desired properties
    """

def create_datapoint(connector, host, key, streamid, geom, starttime, endtime, properties={}):
    """Create a new datapoint in Geostreams.

    Keyword arguments:
    connector -- connector information, used to get missing parameters and send status updates
    host -- the clowder host, including http and port, should end with a /
    key -- the secret key to login to clowder
    streamid -- id of stream to attach datapoint to
    geom -- GeoJSON object of sensor geometry
    starttime -- start time, in format 2017-01-25T09:33:02-06:00
    endtime -- end time, in format 2017-01-25T09:33:02-06:00
    properties -- JSON object with any desired properties
    """

def get_sensor_by_name(connector, host, key, sensorname):
    """Get sensor by name from Geostreams, or return None.

    Keyword arguments:
    connector -- connector information, used to get missing parameters and send status updates
    host -- the clowder host, including http and port, should end with a /
    key -- the secret key to login to clowder
    sensorname -- name of sensor to search for
    """

def get_sensors_by_circle(connector, host, key, lon, lat, radius=0):
    """Get sensor by coordinate from Geostreams, or return None.

    Keyword arguments:
    connector -- connector information, used to get missing parameters and send status updates
    host -- the clowder host, including http and port, should end with a /
    key -- the secret key to login to clowder
    lon -- longitude of point
    lat -- latitude of point
    radius -- distance in meters around point to search
    """

As an aside, not sure how you're chopping these to plots right now but this task becomes a lot easier after we have a way to stitch + clip images to plots: https://github.com/terraref/computing-pipeline/issues/265

dlebauer commented 7 years ago

@ZongyangLi is this extractor ready to deploy?

ZongyangLi commented 7 years ago

@dlebauer Insert existed 'height' trait into BETYdb is ready, the only thing I need your confirm is using 864 plots or 1728 plots. To make it as an extractor in clowder, I still need some support mentioned here: https://github.com/terraref/computing-pipeline/issues/193#issuecomment-290507753

max-zilla commented 7 years ago

@ZongyangLi sorry if I missed it, where is code for this extractor? I know the canopycover extractor is in https://github.com/terraref/extractors-stereo-rgb

If you can share what you have I can contribute to the Clowder part.

ZongyangLi commented 7 years ago

@max-zilla Insert existed 'height' trait into BETYdb is not an extractor, 'height' data for season 2 is on my desktop, I have a local script can upload them to BETYdb.

To make it as an extractor in clowder there are something I am not sure:

  1. (0,0,0) point in the point cloud is somewhere in middle of the field, I have no idea where is this (0,0,0) point in gantry coordinate system, and what's the positive orientation for west scan and east scan, this relate to https://github.com/terraref/reference-data/issues/44.
  2. ‘an individual plot is made up of several ply files, which means I need a set of raw data from clowder as input automatically’

I made a comment here https://github.com/terraref/computing-pipeline/issues/193#issuecomment-270767586 to describe the support I needed and discuss with @dlebauer , but I didn't find the answer to this issue yet. I am sorry too if I missed any response for this.

max-zilla commented 7 years ago

@ZongyangLi for #2, unless @dlebauer says different I say we do your second option:

Another way of solving this problem is create several traits for a plot for one day if it is acceptable, and that will be much easier.

...when the field stitching is done, we can modify this extractor to trigger on the full merged PLY file and avoid the need to generate multiple traits, but we can do that for now just to have something running.

I pinged Solmaz and Stuart about #1.

ZongyangLi commented 7 years ago

@max-zilla This comment is from @dlebauer 'While It is possible to add replicate measurements for a single plot (we are already doing this for field data), we should do this when it is scientifically useful and not because of the way that the data are written.'

max-zilla commented 7 years ago

PLY chunks... 1) need to consider sampling rates across PLY files so we dont mix and match. 2) better to stitch & subset according to a design. 3) we can deploy what we have now & replace with final version by end of May.

ZongyangLi commented 7 years ago

@max-zilla I am going to upload my code associate with plant height. Is there anywhere on Github may I upload and update my code?

max-zilla commented 7 years ago

@ZongyangLi if using PLY data, please upload here: https://github.com/terraref/extractors-3dscanner

ZongyangLi commented 7 years ago

@max-zilla Okay! Could you please create a new directory there?

max-zilla commented 7 years ago

@ZongyangLi created 'plantheight' directory

dlebauer commented 7 years ago

I don't see the plantheight directory, but could you change it to plant_height to match the naming in BETYdb?

On Fri, Apr 21, 2017 at 2:43 PM Max Burnette notifications@github.com wrote:

@ZongyangLi https://github.com/ZongyangLi created 'plantheight' directory

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/210#issuecomment-296289043, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX510N6jN20XWP-y3ekYu-GXqBroN7ks5ryQbpgaJpZM4LIBt3 .

ZongyangLi commented 7 years ago

codes updated to https://github.com/terraref/extractors-3dscanner/tree/master/plantheight

max-zilla commented 7 years ago

@dlebauer @ZongyangLi renamed to plant_height: https://github.com/terraref/extractors-3dscanner/tree/master/plant_height

max-zilla commented 7 years ago

created this issue to finish deployment: https://github.com/terraref/computing-pipeline/issues/303

There was an error scanning the PLY, but code is 90% there. Last missing piece is how to extract the actual values to geostreams - I can create the datapoints if you can show how to get an array of values from the histogram.npy file

max-zilla commented 7 years ago

@ZongyangLi will mention on phone, but remaining piece is to extract values from height file for BETYdb and histogram file for geostreams. Right now these outputs are just uploaded to Clowder as .npy files.

ZongyangLi commented 7 years ago

@max-zilla I think there are still some other issues on this extractor. Maybe we could ignore them for now, but it will not be a true pip-line without solving these problem

  1. The single scan covers about 1 meters in NS-direction, a plot length in NS is around 4 meters, we will need to integrate 4 or 5 scan's data to get a plot-level height.
  2. The existed 'yshift' value here https://github.com/terraref/extractors-3dscanner/blob/master/plant_height/full_day_to_histogram.py#L597 only works for season2. This relate to https://github.com/terraref/reference-data/issues/44 as I mentioned before.
  3. To extract values from histogram.npy, they are here: https://github.com/terraref/extractors-3dscanner/blob/master/plant_height/full_day_to_histogram.py#L277
max-zilla commented 7 years ago

@solmazhajmohammadi is going to share the point cloud offset to gantry for #2.

for #1, calibrated in august and validation in November. Saw misalignment between two different point clouds ~4cm. Suspect that alignment is shifting with temperature. @smarshall-bmr will do that scan. maximum we recorded up to now is about ~5cm between hot summer and middle winter.

max-zilla commented 7 years ago

@ZongyangLi will look at 2017 data for the merge and ignore season 1 issues for the moment.

I will add the geostreams function and then we can deploy initial version.

solmazhajmohammadi commented 7 years ago

@smarshall-bmr it seems that the misalignment in pointclouds is due to the temperature change. It would be great if you can do multiple scan in different time of the day to see the variation with changing the temperature. Thanks

ZongyangLi commented 7 years ago

@dlebauer I'm going to insert height data into BETYdb. Here is an example of csv file. Could you please review and have a check. Thanks!

https://drive.google.com/open?id=0B5QCp_Onc6nOUGR0WVpsREl5NWc

dlebauer commented 7 years ago

@zongyangli that looks good. Go ahead and upload. Thanks!