Closed max-zilla closed 5 years ago
@ZongyangLi can you please define the variables and methods associated with these data?
Added methods by the following link: https://terraref.ncsa.illinois.edu/bety/methods/new
Scanner 3d ply data to 98th quantile height Scanner 3d ply data to leaf angle distribution Scanner 3d ply data to panicle counting
Added variables by the following link: https://terraref.ncsa.illinois.edu/bety/variables/new
98th_quantile_canopy_height leaf_angle_alpha_src leaf_angle_beta_src leaf_angle_alpha_fit leaf_angle_beta_fit leaf_chi_src leaf_chi_fit panicle_counting panicle_volumn_median panicle_surface_area_median
Error operation:
Added 98th_quantile_canopy_height
to methods
, please delete it in methods
@ZongyangLi thanks for doing this ... should the trait associated with ‘Scanner 3D ply to 98th quantile height` be associated with the trait ‘canopy_height’? More specifically, if using the 98th quantile of the point cloud is intended to reflect the actual canopy height, then do we need a separate variable?
Similarly, if the best estimate of the panicle_volume is the median, then it would make sense call the trait ‘panicle_volume’ and describe the method of estimation in the methods (same for surface_area). And I am not sure what the difference is between _src and _fit but I suspect that these can also be differentiated in the methods rather than in the variable itself.
And to clarify - are you requesting that I delete the 98th_quantile_canopy_height method? I can do that although if you added it you should be able to delete it (as long as there aren’t any data already associated with the method).
We've proposed a new naming scheme, listed as [Variable Name, Method].
@dlebauer Does this fit your naming convention?
Change 98th_quantile_canopy_height to [ Canopy Height, 3D_scanner_98th_quantile]
Change Leaf angle variables from: leaf_angle_alpha_src leaf_angle_beta_src leaf_chi_src leaf_angle_alpha_fit leaf_angle_beta_fit leaf_chi_fit
to: [ Leaf Angle Mean, 3D_scanner_leaf_angle_distribution] [ Leaf Angle Variance, 3D_scanner_leaf_angle_distribution] [ Leaf Angle Alpha, 3D_scanner_leaf_angle_distribution] [ Leaf Angle Beta, 3D_scanner_leaf_angle_distribution] [ Leaf Angle Chi, 3D_scanner_leaf_angle_distribution]
And for panicles change from:
panicle_counting
panicle_volume_median
panicle_surface_area_median
to:
[Panicle Count, 3D_scanner_panicle_count]
[Panicle Volume, 3D_scanner_panicle_volume_median]
[Panicle Surface Area, 3D_scanner_surface_area_median]
Additionally, the leaf length and width parameters would have the following variables and methods:
[leaf_length, 3D_scanner_geodesic_kalman] [leaf_length, 3D_scanner_geodesic_unfiltered] [leaf_width, 3D_scanner_geodesic_kalman] [leaf_width, 3D_scanner_geodesic_unfiltered]
Do those naming conventions for variables and methods seem to be more consistent?
Hi Abby - this is definitely on the right track, but I have a few thoughts and it will be easier to flush this out in this spreadsheet where we can capture the other information like descriptions, units, citations, etc.
A few notes -
Method names It might make sense to include something about the algorithm used (like where 'kalman' is used) rather than just saying '3D Scanner Panicle Volume' which doesn't allow it to be differentiated from another algorithm.
Variable names The variable naming convention loosely follows the structure of CF (Climate Forecast) standard names are constructed ... you can see examples here. And are thus snake_case
. Method names don't have such constraint so can be typed like the title of a protocol.
Statistics
The leaf_angle_mean and leaf_angle_variance present a special case since BETYdb is designed to store the mean values alongside (optionally the sample size and a statistic, so the appropriate name for the mean leaf angle would be leaf_angle
and each of these values can either standalone or be stored with a statistic. It would still be okay to have leaf_angle_variance alongside leaf_angle_beta etc, but there is also the option of including columns 'stat', 'statname' and 'n'. For now lets ignore n because that gets confusing. Unfortunately we only store one statistic for each record or else we could treat alpha and beta in the same way.
Also on the topic of variance. Does the variance you are computing have the same units as the mean? Would it make sense to call this 'Standard Deviation'?
As a footnote, I'll reference this lengthy discussion where I think we concluded that we would fit the normal and beta distributions separately, such that, e.g., mean != alpha/(alpha+beta)); if these values end up being equal then we should reconsider only storing one or the other set of parameters or else analyses that include both traits might have numerical issues.
Hi David - I don't currently have permission to edit that google sheet. If you grant it, I can fill things out there, but in the meantime, I'll reply in line here. I've gone through and edited our variables and methods to reflect your comments (snake case for variables, descriptive for methods, adding in algorithm details where appropriate). If you're on board with these changes, then @ZongyangLi can implement them.
Change 98th_quantile_canopy_height to [ canopy_height, 3D scanner to 98th quantile height]
Change Leaf angle variables from: leaf_angle_alpha_src leaf_angle_beta_src leaf_chi_src leaf_angle_alpha_fit leaf_angle_beta_fit leaf_chi_fit
to:
[ leaf_angle_mean (+ leaf_angle_variance stored stored alongside as statistic), 3D scanner to leaf angle distribution] [ leaf_angle_alpha, 3D scanner to leaf angle distribution] [ leaf_angle_beta, 3D scanner to leaf angle distribution] [ leaf_angle_chi, 3D scanner to leaf angle distribution]
And for panicles change from: panicle_counting panicle_volume_median panicle_surface_area_median
to:
[panicle_count, 3D scanner to panicle count faster_rcnn + roughness treshold + convex hull] [panicle_volume, 3D scanner to panicle volume faster_rcnn + roughness treshold + convex hull] [panicle_surface_area, 3D scanner to panicle surface area faster_rcnn + roughness treshold + convex hull]
Additionally, the leaf length and width parameters would have the following variables and methods:
[leaf_length, 3D scanner to leaf measurements kalman] [leaf_length, 3D scanner to leaf measurements unfiltered] [leaf_width, 3D scanner to leaf measurements kalman] [leaf_width, 3D scanner to leaf measurements unfiltered]
Regarding the leaf angle variance, @ZongyangLi is currently saving the variance, but we could obviously compute standard deviation is that were the preferred measurement?
@dlebauer @abby621
Files updated to here in the sub directory: https://drive.google.com/open?id=1Y-Qdxe1GgCgXSxR0KFEeyIVyoQyv-tCX
Example leaf angle csv file: https://drive.google.com/open?id=10awD6-suq49L_TGI0x5Q3L-jSJvFmlBX
If we all agree with the current definition of methods and variables, I could add those to BETY.
@abby621 you should have access to the google doc if you want to update the records there. then @kimberlyh66 can upload the data and we will be on our way!
@dlebauer We have the spreadsheet almost entirely filled, but have a question regarding the min/max values. Should that be the min/max that we've ever seen, or some sort of bound on the possible reported values? I'm not sure that we know what that should be -- our algorithms don't specify particular min/max values beyond what's specified by the datatype (so a leaf could technically be hundreds of meters long, even if we would never expect to observe that).
@abby621 consider these to be very broad uniform priors that set upper and lower bounds on what data should be considered 'valid'. If they fall outside of the range they will be rejected. Then we can always update the min/max values if they should not be rejected.
So, these should be set so that they provide a high level constraint on valid values - most variables have a lower bound at 0; some have upper bounds at 1 or 100 by definition. The longest leaf in the world is 25m long so we could set max at 25000mm, or we could go with something like 2m which is more reasonable for Sorghum (and wheat). For leaf angle, if in degrees then I think the valid range would be [0,90]? In many cases we have -inf,inf, but these aren't very useful.
I have already filled in the sheet and update the new methods name and variables in csv file, can we go ahead and get it uploaded now?
OK, I will try to upload in the morning after downloading new CSV files. We must make sure they are in BETY as well. We can ask @kimberlyh66 to add the new / updated names to BETY and I can upload the trait data.
Is this the spreadsheet (https://docs.google.com/spreadsheets/d/1nDVti2uj2cWboAmsqzQGyXidZFnqi5jPmBw23nGKH9E/edit#gid=1676929050) with new method names and variables? If @dlebauer approves, I can add to BETY.
@max-zilla I can also help with uploading the trait data if you would like.
@Huynh, Kimberly My-Linh - (kimberlyh)mailto:kimberlyh@email.arizona.edu if you can update the method names and descriptions then Max can upload the trait data.
From: Kimberly Huynh notifications@github.com Sent: Monday, June 10, 2019 1:26:00 PM To: terraref/computing-pipeline Cc: LeBauer, David Shaner - (dlebauer); Mention Subject: Re: [terraref/computing-pipeline] Uploading BETY trait CSVs from Google Drive (#582)
@max-zillahttps://github.com/max-zilla I can also help with uploading the trait data if you would like.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/terraref/computing-pipeline/issues/582?email_source=notifications&email_token=AADRPZ33BB7E3GOS4VTDYN3PZ22FRA5CNFSM4HRFNEA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXLED5Q#issuecomment-500580854, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AADRPZ5F7CR3PDHMPCLZEQLPZ22FRANCNFSM4HRFNEAQ.
All new methods and variables have been added to BETY.
@ZongyangLi You mentioned in the spreadsheet that the method 3D scanner to leaf length and width
should be the same method that I used to upload Zeyu's data. If this is the case, you should use the name Scanner 3d ply data to leaf length and width
.
@kimberlyh66 The method in the spreadsheet 3D scanner to leaf length and width
is actually made for Zeyu, if you have already uploaded his data, then you could skip it.
Downloaded the rewritten version of all files, but I'm still getting an error on the citation:
/Users/mburnette/Downloads/BETYdbUploadsV2/s4_98th_height_rewrite/2017-05-06_98th_quantile.csv,{:
lookup_errors=>[
"No citation could be found matching {\"author\"=>\"ZongyangLi\", \"year\"=>\"2018\", \"title\"=>\"Maricopa Field Station Data and Metadata\"}",
"No citation could be found matching {\"author\"=>\"ZongyangLi\", \"year\"=>\"2018\", \"title\"=>\"Maricopa Field Station Data and Metadata\"}",
"No citation could be found matching {\"author\"=>\"ZongyangLi\", \"year\"=>\"2018\", \"title\"=>\"Maricopa Field Station Data and Metadata\"}",
...
I think the other fields are the same besides the 2018, I think this is another entry we need to add to BETY first?
@max-zilla I guess here year
should be 2016
.
Could you change it from 2018
to 2016
and try again?
If it works I can update all the csv files.
@ZongyangLi changing it to 2016 results in Success!
@max-zilla Should be all right this time, please find all collections here:
https://drive.google.com/open?id=1fDGakYulkLjLSAG0e_H-MEmjT69Bg2zF
Uploading these now, will close this once finished.
@ZongyangLi the LeafAngle and 98th height CSVs uploaded successfully, but the panicle CSVs encountered error:
No method could be found matching {"name"=>"3D scanner to panicle count faster_rcnn + roughness threshold + convex hull"}
@kimberlyh66 @ZongyangLi can we update this method in BETY so i can upload panicle data and close this? thanks!
@kimberlyh66
I think there was a type error in the spreadsheet previously, there was a missing letter 'h' in the word 'roughness threshold', could you change it to the right one? Thanks.
@ZongyangLi @max-zilla the method has been updated to be
3D scanner to panicle count faster_rcnn + roughness threshold + convex hull
@kimberlyh66 thanks much! this is now uploaded & complete.
@kimberlyh66 during our last NCSA meeting I told @dlebauer I would attempt to upload these BETYdb CSVs to BETY that were uploaded by @ZongyangLi , but if there were issues with uploading them he suggested asking you for assistance.
https://drive.google.com/drive/folders/1Y-Qdxe1GgCgXSxR0KFEeyIVyoQyv-tCX There are 3 directories in this Google Drive folder with .tar files containing daily CSVs for BETY upload:
I wrote a small Python script to iterate over the daily CSVs and push them to BETY with some key snippets here:
...however, none of the CSVs were successfully uploaded. A line from my logfile for each file:
I'm assuming perhaps we need some trait defined in bety that corresponds with the column names I listed above that don't exist yet? We've successfully uploaded other bety data such as CanopyCover with similar CSVs and the "No trait variable..." error message was coming from BETY with a 400 response on the post.
Please let me know if you might be able to look into this and how I can help.