Transfer Leaf length output data into BETYdb

dlebauer commented 5 years ago

(From @ZongyangLi )

Zeyu from GWU successfully ran the leaf length pipeline from our laser data. The current output data are here: https://drive.google.com/open?id=143bCjxCv61Vf0uE60Z5clLUFQfhP2kox

In order to re-form it into a BETYdb friendly version, I made another example of output csv file here: https://docs.google.com/spreadsheets/d/1_h3xq_N8x9G35aji40KsJSZqnqmXAXKFYJz3_74cw4c/edit?usp=sharing

If this doesn't work, please let us know what's the best format for leaf length data. If this works for you, I will work with Zeyu together to get it finalize.

dlebauer commented 5 years ago

Based on a quick look, the updated format looks good

The format looks good.

One thing to note is that each metadata value in the table needs to correspond to a record in the database. Therefore, you will need to

[ ] create a citation record with the author, year, and title that are in the csv file here: https://terraref.ncsa.illinois.edu/bety/citations/new
[ ] create a variable named ‘leaf_length’ with units 'mm' here: https://terraref.ncsa.illinois.edu/bety/variables/new
[ ] a new method here: https://terraref.ncsa.illinois.edu/bety/methods/new with a name that matches the name in the csv file
- you can write the method similar to the Scanner 3d ply data to height and while you are there, please check that the regression formula is up to date.

Note that many fields are optional - you only need to fill out the relevant ones required to uniquely identify a record.

I think that there is a way to test out the data with the API, that will return these and any other errors that would prevent the data from being inserted. @max-zilla do you know the best way to do this?

ZongyangLi commented 5 years ago

@dlebauer Could you also add @abby621 to our project. She is also going to follow up this.

ZongyangLi commented 5 years ago

@dlebauer I think @Andrewzhzy is Zeyu Zhang from GWU, can we add him here to follow this up?

dlebauer commented 5 years ago

@ZongyangLi sorry about that; I've added Zeyu and Abby

zhzyx commented 5 years ago

Hi @dlebauer, I signed up on the betydb, and tried to create citations. But it seems my account don't have the permission to do so. Is there anything else I should do to to get the permission? Thanks!

dlebauer commented 5 years ago

Hi Zeyu - I just updated your permissions - you should now be able to create new records.

zhzyx commented 5 years ago

Thanks! But the method example seems still cannot be opened: Scanner 3d ply data to height https://terraref.ncsa.illinois.edu/bety/methods/6000000009

zhzyx commented 5 years ago

Hi @dlebauer , we just got the result of the leaf length also the leaf width. The code and descriptions are here: https://github.com/GWUvision/Sorghum-Leaf And the csv format data: https://1drv.ms/f/s!AmSJkn9rHbVrob4jOTGcm7eo9oSU5Q Could you help me to see if the format works? Also how could I upload these to betydb?

Thanks

dlebauer commented 5 years ago

First step is to add a record in the BETY database to describe the method: https://terraref.ncsa.illinois.edu/bety/methods/new. Or, we could change the one for "Scanner 3d ply data to leaf length" to "Scanner 3d ply data to leaf length and width" if one algorithm is doing both.

Next step is to try either the bulk upload here: https://terraref.ncsa.illinois.edu/bety/bulk_upload/start_upload or the API https://pecan.gitbook.io/betydbdoc-dataentry/adding-trait-and-yield-data/adding-traits-via-the-v1-api

I would start with a small subset of the file to check that the format and content is correct (the error messages can be helpful, but submitting a large file at once is hard to do). if you have questions, @kimberlyh66 and I can help with this (and we are also on slack).

max-zilla commented 5 years ago

I will try uploading this today

max-zilla commented 5 years ago

I notice that the plot names refer to "Season 4" but the dates refer to Season 6. @Andrewzhzy these are Season 6 data right? If so we can just fix the plot names I think.

zhzyx commented 5 years ago

Let me double check exactly where was went wrong.

zhzyx commented 5 years ago

Sorry, it's Season 6. I mislabeled the season number under the site name. Could you help me to fix that? Thank you

dlebauer commented 5 years ago

@Andrewzhzy to confirm, what @max-zilla suggested in https://github.com/terraref/reference-data/issues/259#issuecomment-516479447 is correct?

dlebauer commented 5 years ago

@max-zilla I am not sure if this is the prettiest approach but I think that this will generate the update statements that are required.


WITH table1 AS (
    SELECT
        traits.ID,
        sites.ID AS old_site_id, sitename as old_sitename 
    FROM
        traits
        JOIN sites ON traits.site_id = sites.ID 
    WHERE
        EXTRACT ( YEAR FROM traits.DATE ) = 2018 
        AND sitename LIKE'%Season 4%' 
        AND variable_id in (select id from variables where name in ('leaf_length', 'leaf_width'))
    ) SELECT
    table1.ID,
    table1.old_site_id, table1.old_sitename,
    sites.id AS new_site_id , sites.sitename as new_sitename,
    concat('update traits set site_id = ', sites.id, ' where id = ', table1.ID, ';') as update
FROM
    table1
    JOIN sites ON REPLACE ( old_sitename, 'Season 4', 'Season 6' ) = sites.sitename;

dlebauer commented 5 years ago

I am recieving a constraint violation and it appears that I need to fix cultivar_id values, but this doesn't seem to work:

update traits set cultivar_id = NULL where EXTRACT ( YEAR FROM traits.DATE ) = 2018 
        AND site_id in (select id from sites where sitename LIKE'%Season 4%') 
        AND variable_id in (select id from variables where name in ('leaf_length', 'leaf_width')) ;

zhzyx commented 5 years ago

@dlebauer Yes, the plot name should be season 6. Thank you for taking care this bug I made.

dlebauer commented 5 years ago

@Andrewzhzy I think it would be easier if I delete all of these records - could you fix the csv files and try uploading them again?

zhzyx commented 5 years ago

@dlebauer I just updated the csv files. I'm ready to upload them anytime. https://1drv.ms/f/s!AmSJkn9rHbVrob4jOTGcm7eo9oSU5Q

max-zilla commented 5 years ago

@Andrewzhzy did you upload the fixed data? we can close this issue once it's done and we confirm they are there.

gsrohde commented 5 years ago

Note that local_datetime column values should be in YYYY-MM-DD format. (See the Date section of https://pecanproject.github.io/bety-documentation/dataentry/adding-trait-and-yield-data.html#detailed-csv-data-file-specifications (for Bulk Upload) or https://pecanproject.github.io/bety-documentation/dataentry/inserting-new-traits-via-the-api.html#schema-for-csv-data-files (for API upload).)

zhzyx commented 5 years ago

@gsrohde Thank you for pointing out, I've corrected that. I'm trying to upload, but I always encounter an error while using bulk upload: "Site is inconsistent with citation in these rows".

dlebauer commented 5 years ago

Fixed, for reference:

begin;
INSERT INTO citations_sites (citation_id, site_id) 
   SELECT c.id, s.id 
   FROM (select id from citations where author = 'Zeyu Zhang') 
     AS c CROSS JOIN 
         (select id from sites where sitename like 'MAC Field Scanner Season 6%')
         AS s;

commit;

terraref / reference-data

Transfer Leaf length output data into BETYdb #259