terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
24 stars 13 forks source link

Add cultivars to site metadata alongside plot #537

Closed max-zilla closed 5 years ago

max-zilla commented 6 years ago

Cultivar and treatments added to the metadata alongside plotname.

sites_cultivars and sites_treatments tables.

@tcnichol or I can add this.

max-zilla commented 5 years ago

https://github.com/terraref/terrautils/blob/master/terrautils/lemnatec.py#L118 Basically terrautils needs updating that includes additional info besides the sitename.

max-zilla commented 5 years ago

get_associations=full_info or similar might be required to include names, etc.

max-zilla commented 5 years ago

https://github.com/terraref/terrautils/blob/master/terrautils/betydb.py#L156

tcnichol commented 5 years ago

is creating new database tables an acceptable approach to the task? Right now experiments_sites has an experment_id, and experiments_treatments has an experiment_id and a treatment_id, and treatment has a treatment_id and a treatment name and description. Just seems like generating a table with all those values once rather than attempting a join on 3 tables would be an quicker to run, or else adding some values to the experiments_treatments table.

any input is appreciated.

max-zilla commented 5 years ago

that would represent a significant change to BETY schema and require input and approval from others I think, @robkooper correct me if I'm wrong but that might be more change than we want to get into here. with proper indexes, i dont think the query would be that expensive. we'd already have experiment ID, just a join on 2 tables (experiments_treatments and treatment tables)

tcnichol commented 5 years ago

we'd have to also join on experiments_sites, since experiments_treatments doesn't have a site_id. working on a join that should work on those tables, just it's been a while since I did any complicated sql queries. i'll keep trying something, an inner join should be possible between the 3 tables.

max-zilla commented 5 years ago

For our current cleaning pipeline, we get a list of sites (plots) that intersect the dataset according to the bounding box centroid. We separately get the experiment(s). From the sound of what you describe, treatments are associated with an experiment and not a site. So we don't need to join in site_ids, we already have the site from elsewhere. Unless there's a missing table or a site_id field in the treatments table or something. Let me know if you need help, I'm pretty quick with SQL.

robkooper commented 5 years ago

Do some tests to see how long it takes to run given the tables you have right now. Keep in mind some of these are created by ruby to do a n::m relationship and might be harder to keep up to date if we manually create them.

tcnichol commented 5 years ago

I created a pull request in brapi - I added the treatments related routes, and will still add the cultivars one later today. The plan is that I make changes in brapi, and then I change terrautils.lemnatec in a separate pull request to take advantage of the new brapi routes.

dlebauer commented 5 years ago

Would it make sense to implement one of the following?

tcnichol commented 5 years ago

@dlebauer

There will be no need to create other tables.

For the rails api, I'm changing the brapi application now that should enable us to get the cultivars and treatments. When you mention the rails api, which package/repository is that?

dlebauer commented 5 years ago

Sounds good. The rails API is on GitHub.com/pecanproject/bety

tcnichol commented 5 years ago

I have a pull request open for brapi that implements some methods that would help with this. It is

https://github.com/terraref/brapi/pull/10

I'm not sure if I've implemented things correctly there, and will need another pull request in terrautils to take advantage of those methods in adding cultivar and site metadata.

max-zilla commented 5 years ago

There has been some further discussion in slack:

Are you following the v1.3 Swagger specification https://app.swaggerhub.com/apis/PlantBreedingAPI/BrAPI/1.3? There are certainly some issues in mapping the BrAPI concepts to the TERRA REF data, but I get stuck where the Swagger specification https://app.swaggerhub.com/apis/PlantBreedingAPI/BrAPI/1.3#/Studies/get_studies says that /studies/{studyDbId} and /studies/{studyDbId}/germplasm should return a “result” object named “data” but the ones you posted in https://github.com/terraref/brapi/pull/10#issuecomment-470711698 have objects named “study”. Also, there is no longer a /phenotypes-search endpoint in v1.3. Sorry if it wasn’t clear which version we were using. ... following the conversation on the BrAPI channel, it seems that we might be looking for the GET /studies/{studyDbId}/layout endpoint that essentially provides information from the experiments_sites and sites_cultivars tables

max-zilla commented 5 years ago

make sure we're updating README in the brapi repo when equivalencies between brapi and terra vocabularies are updated

max-zilla commented 5 years ago

@dlebauer todd is waiting for you to comment on https://github.com/terraref/brapi/pull/10

dlebauer commented 5 years ago

Sorry I missed that ... it looks good and I've approved the changes. I think now waiting for a few changes requested by @robkooper

max-zilla commented 5 years ago

cultivar and site can now be added in terrautils using the new brapi routes. will need to be a corresponding terrautils update.

tcnichol commented 5 years ago

i will create an issue for terrautils that finishes this one up.

max-zilla commented 5 years ago

@tcnichol did you create the issue? close this one if so.

tcnichol commented 5 years ago

I will create a new issue and close this one. I was off yesterday and didn't get around to it.


From: Max Burnette notifications@github.com Sent: Thursday, May 9, 2019 12:55 PM To: terraref/computing-pipeline Cc: Nicholson, Todd Christopher; Mention Subject: Re: [terraref/computing-pipeline] Add cultivars to site metadata alongside plot (#537)

@tcnicholhttps://github.com/tcnichol did you create the issue? close this one if so.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/terraref/computing-pipeline/issues/537#issuecomment-491003570, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJRPBB7N7LILWN64B3IVWLDPURQQLANCNFSM4GEHVKGA.