terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
24 stars 13 forks source link

Upload UAV NDVI data to terraref.org/bety #201

Closed dlebauer closed 7 years ago

dlebauer commented 8 years ago

Description

@rickw-ward has provided NDVI data for five dates.

gsrohde commented 8 years ago

@dlebauer In preparation for bulk upload, I had to make the following changes:

CSV file:

Column "sitename" must be "site". Hyphens are required in date values; e.g., 2016-08-08, not 20160808.

Database:

"NDVI" must be a trait variable in the trait_covariate_associations table. I added a row for trait NDVI (with optional covariate "age").

Entries in the citations_sites table must be added to associate each of the sites in the file with Ward 2016. (I haven't done this yet. UPDATE: THIS IS DONE NOW.)

I will have to add the method associations manually. The Bulk Upload wizard doesn't provide for this.

As you noted, the genotypes (cultivar_id values) will need to be added manually as well.

Also:

I'm ignoring the Range and Pass columns. This information is of course already in the site column. I'm also ignoring the rep column.

I expect to finish this up tomorrow afternoon unless you need it before then.

gsrohde commented 8 years ago

@dlebauer I was going to finish this this afternoon but I realized I need to add a treatment for this data set. (No existing treatment is associated with the Rick Ward citation.) What should the treatment be?

dlebauer commented 8 years ago

@gsrohde the treatment should be the same as the one for Maria's data, and if it does not exist it can be 'observational'

gsrohde commented 8 years ago

@dlebauer The names of Maria's treatments are "Control", "low density", "medium density", and "high density". Which, if any of these, should I use? Or is it better to have distinct treatments for distinct citations (even if the treatments are essentially the same)? Or does it matter?

dlebauer commented 8 years ago

Use "Control"; density treatments were from the first season.

gsrohde commented 8 years ago

Also, what should the access level be?

dlebauer commented 8 years ago

2

On Tue, Nov 8, 2016 at 5:28 PM Scott Rohde notifications@github.com wrote:

Also, what should the access level be?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/201#issuecomment-259292612, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX57zlL0zWreahNFibaF3qAL_tUf_uks5q8QWwgaJpZM4KroH4 .

gsrohde commented 8 years ago

OK, done (except for adding genotype information).

ghost commented 7 years ago

Need to add genotype data before closing.

dlebauer commented 7 years ago

@rickw-ward do you have times for these flights?

rickw-ward commented 7 years ago

@dlebauer I have lost the thread- which flights?

dlebauer commented 7 years ago

@rickw-ward looks like we have the dates: https://docs.google.com/spreadsheets/d/1qeyYA4x8WX_OnahC6Dr7k1Ut4TuSObMJyYFvOyA_1gY/edit#gid=0

@gsrohde have these been uploaded?

dlebauer commented 7 years ago

Method should be https://terraref.ncsa.illinois.edu/bety/methods/6000000003

dlebauer commented 7 years ago

@rickw-ward can you confirm - was the NDVI measured by the Parrot Sequoia or the MicaSense camera?

gsrohde commented 7 years ago

@dlebauer I uploaded this data set on 11/8 (see https://github.com/terraref/computing-pipeline/issues/201#issuecomment-259294680). The dates were already in the upload, and I set the method manually to "UAV based NDVI" (on 12/02?—they were all updated on that date). So to reiterate, as far as I know, the only thing remaining is to add the culitvar_id values to tie in the genotype information (the solitary checkbox in the Description section).

dlebauer commented 7 years ago

@gsrohde final step for updating the cultivar_id is to run the updates in this sheet that adds a cultivar_id given a particular site_id.

After running the relevant updates, you can close this issue and then open a new one in which we can consider a more automated solution for capturing the cultivar-plot relationships. My proposal to add cultivar_id to the experiments_sites table (https://github.com/PecanProject/bety/issues/410#issuecomment-217205761) might work. The problem is that the solution is very specific to breeding trials. What do you think?

gsrohde commented 7 years ago

@dlebauer I looked at the spreadsheet with the update statements and there are some problems:

  1. The update statements have no WHERE clauses, so each one will update the whole traits table. For example, where you wrote

    update traits set site_id = (select id from sites where name like 'MAC Field Scanner Plot 1 Season 2%'), cultivar_id = (select id from cultivars where name = 'Ton-a-Milk');

    I'm assuming you perhaps intended

    update traits set cultivar_id = (select id from cultivars where name = 'Ton-a-Milk') WHERE site_id = (select id from sites where name like 'MAC Field Scanner Field Plot 1 Season 2');

    (Notice the extra "Field" before "Plot": all season 2 sitenames containing the string "Plot" have the form "MAC Field Scanner Field Plot % Season 2" where "%" is some integer.)

  2. But even as corrected, these updates won't touch the traits in the set I uploaded on 11/8/2016. All site names in that data set have the form "MAC Field Scanner Season 2 Range % Pass %" whereas the site names in the "update" spreadsheet all have the form "MAC Field Scanner Field Plot % Season 2" (except in the latter part of the table where column C (the cultivar name column) is empty).

If you gave me verbal instructions superseding the instructions in comment https://github.com/terraref/computing-pipeline/issues/201#issuecomment-267458905, I don't recall what they were.

The cultivar_id updates (as corrected) will, however, apply to the traits @ZongyangLi uploaded on Jan. 12 since those do involve sites with names of the form "MAC Field Scanner Plot % Season 2".

dlebauer commented 7 years ago

I think the update statements here will work https://docs.google.com/spreadsheets/d/1xIRwroYObD125I0TDCDp3YPIGPi2Qzeh-vlUCg_b85Q or at least provide enough information to identify site-cultivar pairs

gsrohde commented 7 years ago

@dlebauer I assume you mean use columns R and S in https://docs.google.com/spreadsheets/d/1xIRwroYObD125I0TDCDp3YPIGPi2Qzeh-vlUCg_b85Q/edit#gid=732810075 (the fourth sheet) to generate the update statements?

dlebauer commented 7 years ago

Yes On Thu, Jan 19, 2017 at 8:51 AM Scott Rohde notifications@github.com wrote:

@dlebauer https://github.com/dlebauer I assume you mean use columns R and S in https://docs.google.com/spreadsheets/d/1xIRwroYObD125I0TDCDp3YPIGPi2Qzeh-vlUCg_b85Q/edit#gid=732810075 (the fourth sheet) to generate the update statements?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/201#issuecomment-273812901, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX53CMq_M_y6egqff8qrMwBJVthKDvks5rT4Z9gaJpZM4KroH4 .

gsrohde commented 7 years ago

@dlebauer I did the updates—all the trait rows I inserted on 2016-11-08 now have cultivar ids. Note that many site names in column R don't match any existing trait sites.

dlebauer commented 7 years ago

many site names in column R don't match any existing trait sites.

that is expected result of not having data for the border rows.