Open ZongyangLi opened 6 years ago
@ZongyangLi Last time we looked into the boundary issues the problem was in terrautils so I will defer to @max-zilla first ... I wouldn’t be surprised if it is in BETYdb, but it would be easier to debug this if starting with either the specific API calls or the analogous SQL queries that are generating the unexpected data.
With respect to not having subplots in season 2, that is because they were never generated... you can see some discussion here https://github.com/terraref/reference-data/issues/194#issuecomment-33368094. Are there different numbers on other dates? It is good to point out this inconsistency, though I don’t think it is an error. Rather, it is an artifact of working out how we are analyzing data in the first season. Therefore, I wouldn’t worry that there are only 350 plots on 2016-09-28 unless there is a very different number of plots on an adjacent date or within the season
With respect to subplots, they are primarily usefull when training or validation data is collected at the sub-plot level. However, I think it would be sensible (though not necessary) to exclude them from the automated pipeline by default.
@dlebauer I did another test for boundary after season 2, I am happy that they are consistency, all of them have subplot information. I will continue the deploying and ignore season 2.
With respect to trait type, I am going to generate panicle counting number in the first run. The planned traits field would be: 'local_datetime', 'panicle_counting', 'access_level', 'species', 'site', 'citation_author', 'citation_year', 'citation_title', 'method'
Please let me know if this make sense.
Another question is about target dates. Since sorghum season with panicles are quite few(compare to all year), I think we don't have to run the pipeline for whole year. Maybe middle and late sorghum season would be enough, for example in season 4 from 2017-06-20 to 2017-09-11, and current season from 2018-06-20 to the end of current season? Could someone confirm this is a reasonable time for sorghum have those panicles? @NewcombMaria
The last question is about Danforth HTCondor, I would like to make sure whether the download-analysis-delete
workflow might make any damage to the storage or the service. @nfahlgren
@ZongyangLi there's no harm in temporarily downloading the data to analyze it and then deleting it. I'm happy to look over your job setup when you are ready.
there's no harm in temporarily downloading the data to analyze it and then deleting it
This is the way we will handle the new distributed workflow for cases like this where our host system doesn’t have the resources (e.g. for reprocessing or in this case GPUs).
Development of the pipeline is currently a work in progress (see https://github.com/terraref/computing-pipeline/issues/473) but worth checking out and feedback welcome.
@dlebauer nice, I haven't used Pegasus workflows myself yet, just DAG with condor. I was looking into Parsl (http://parsl-project.org/) and Nextflow (https://www.nextflow.io/) recently for use with PlantCV. Just to throw those out there.
@dlebauer Here is an example output csv file from the extractor: https://drive.google.com/open?id=1SLQeyhlMeaFou5_BHXkljmwNHkveUFV9
Could you please take a look if all the records in the file is suitable.
@nfahlgren Last time we talk about the globus auto transform on the server, I may still need your help to finish this, since my current code base on a desktop implements.
@ZongyangLi yes, that should work once you have added the new variable and method to the appropriate tables; then I would test it out against the database.
And I would use a variable name like "panicle_number" with a 'standard name' "number_of_panicles_per_unit_area" (this later is consistent with the CF standard names and units 'm^-2'.
We are going to deploy our panicle detection extractor on Bioinformatics core facility at the Donald Danforth Plant Science Center(HTCondor). I am now creating this issue to address the current status and bugs that blocking the deployment. Associate codes had been uploaded here: https://github.com/terraref/extractors-3dscanner/tree/ZongyangLi-patch-1/panicle_detection
My initial plan(I am happy to make essential updates if there are better plans) for the pipeline workflow on HTCondor is:
Current status:
What is blocking the deployment?
We would like to access plot boundaries from betydb using terrautils.betydb.py instead of using hard coded boundaries for different seasons. But there still some inconsistent from season to season, includes:
get_site_boundaries(str_date, city="Maricopa")
, we will still get KSU data['coordinates'][0][i]
(i from 0 to 3), but in 2017-04-27, data are saved in['coordinates'][0][0][i]
, we need to go one more layer into the attribute.@dlebauer @max-zilla I remember we had discussed about the inconsistent data from betydb boundaries, I tried to fix it during my parsing codes. But it seems continues to create more and more unexpected values now. I think we really need to figure it out this time.