Closed NewcombMaria closed 5 years ago
@dlebauer I need to revise my previous comment. Please change BAP cultivar PI329440 to "Unknown accession' in sites MAC Field Scanner Season 4 Range 9 Column 14 and Field Scanner Season 4 Range 32 Column 10 as requested above, but PI330807 might not be incorrect. It might only be a mixture of different gentoypes with some plants that are 'off-types'. For now it should remain as it is in the cultivar-sites table.
@gsrohde can you please update these records in the database? The correct cultivar name is 'Unknown off type'
Yes, thanks, @gsrohde the only accession that needs to be changed is PI329440, both replicates: PI329440 MAC Field Scanner Season 4 Range 9 Column 14 PI329440 MAC Field Scanner Season 4 Range 32 Column 10 Updated to 'Unknown off type'.
@dlebauer and @gsrohde the 2017 season 4 experimental design has height blocks (tall, medium, short). It's a row-column design with further constraint that accessions (cultivars) are blocked by height classes. How should the height classes be handled in BETYdb? Is it part of the site/cultivar information? The height blocks extend across ranges (4 ranges are short, 10 are medium, and the rest are tall).
These can be stored as distinct experiments. If you can give a list of ranges within each block, I can add these
Thanks @dlebauer . Height block info is in the tab FINAL_DESIGN column C in this spreadsheet from issue #126 https://docs.google.com/spreadsheets/d/1iUTkyU5P3IfohQg-sLq_ETWvaWBHF5Nvchx3iI8C8Qw/edit#gid=1908904980 Or by range number: Short: ranges 11, 20, 46, 50 Medium: ranges 10, 12, 18, 24, 27, 29, 31, 33, 38, 51 All other ranges are either tall or border plots.
@gsrohde and @dlebauer I have several 2017 field measurement data sets ready to upload. As soon as the site/cultivar/height-block records are revised I'll start uploading. Has there been any changes to the ability to link Methods to data-sets? Do I still keep track of which Methods go with which data sets, or can it now be linked during the BETYdb upload?
@NewcombMaria, @dlebauer
OK, I ran this on bety6's bety database:
UPDATE sites_cultivars
SET cultivar_id = (SELECT id FROM cultivars
WHERE name = 'Unknown off type')
WHERE site_id IN (SELECT id FROM sites
WHERE sitename ~
'^MAC Field Scanner Season 4 Range (9 Column 14|32 Column 10)( (E|W))?');
Checking:
SELECT
s.sitename AS "site name",
c.name AS "cultivar name"
FROM
sites s
JOIN sites_cultivars sc ON sc.site_id = s.id
JOIN cultivars c ON c.id = sc.cultivar_id
WHERE
c.name = 'Unknown off type';
results in
site name | cultivar name
-------------------------------------------------+------------------
MAC Field Scanner Season 4 Range 9 Column 14 | Unknown off type
MAC Field Scanner Season 4 Range 32 Column 10 | Unknown off type
MAC Field Scanner Season 4 Range 9 Column 14 E | Unknown off type
MAC Field Scanner Season 4 Range 9 Column 14 W | Unknown off type
MAC Field Scanner Season 4 Range 32 Column 10 E | Unknown off type
MAC Field Scanner Season 4 Range 32 Column 10 W | Unknown off type
(6 rows)
π π― π '^MAC Field Scanner Season 4 Range (9 Column 14|32 Column 10)( (E|W))?')
@NewcombMaria you should be ready to go !(?)
I added four records to the experiments table that group the border, small, medium, and large plots https://gist.github.com/dlebauer/f083056af7a835c594a9167373c37773
Then associated them with the appropriate plots / ranges by creating the insert statements in the experiments_sites table of the spreadsheet: https://docs.google.com/spreadsheets/d/1iUTkyU5P3IfohQg-sLq_ETWvaWBHF5Nvchx3iI8C8Qw/edit#gid=552329304
@gsrohde and @dlebauer thanks very much!! I'll start uploading some of the 2017 data sets later today.
@gsrohde and @dlebauer turns out I need your help already. I'm trying to upload a spreadsheet for one part of height data. Two questions: Question 1) I made a new 2017 citation, is that the wrong way to use citations? I'm getting the error message: Data Value Errors
I think the site names are correct. The spreadsheet is in this 2017 upload google folder https://drive.google.com/drive/folders/0B1OR5-zeXagnMlpDQjBpRDlYU3M
Question 2) The 'experiment' column with Tall, Medium, and Short blocks is being ignored. Is that ok since the experiments are already linked to sites in the table? Or do I need to do something different?
1) I think that the requirement that sites and citations be linked is a bug - this could be done automatically with the bulk upload. Since this is pre-publication I don't think that at this point there is a need to have separate citations for 2016 and 2017 data. The objective for using the citations table for unpublished data is to track who collected the original data, who can be contacted with questions, and who should be contacted when they are used.
2) the experiments are associated with the sites already, so these don't need to be uploaded.
@dlebauer apparently the new 2017 citation wasn't the problem. I tried the bulk upload for the same spreadsheet but this time with the original 2016 citation and still get the same error message: Data Value Errors
Is there a step that needs to be done using the API to link the new sites (season 4 sites) to existing citations before I can use the bulk upload web interface?
@NewcombMaria i've updated the records (and also created an issue to fix this error in pecanproject/bety#539). Please retry and let me know if this works.
Thanks @dlebauer . The site-citation is fixed and I'm able to make it past the first steps up to 'Confirm data' but get stopped by the message the treatment 'control' is not found in the treatments table.
Not found in treatments table
Specify Data-Set Wide Values
treatment
control
Should I be specifying a different treatment? Or can 'control' be added to the treatment table? I think we are getting close... Thanks for your help.
I just added a treatment called 'Control BAP 2017' that should work: ' https://terraref.ncsa.illinois.edu/bety/treatments/6000000023
Have you already added all of the managements (planting date, fertilization, etc)?
On Tue, Sep 26, 2017 at 11:44 AM MariaNewcomb notifications@github.com wrote:
Thanks @dlebauer https://github.com/dlebauer . The site-citation is fixed and I'm able to make it past the first steps up to 'Confirm data' but get stopped by the message the treatment 'control' is not found in the treatments table.
Not found in treatments table Specify Data-Set Wide Values treatment control
Should I be specifying a different treatment? Or can 'control' be added to the treatment table? I think we are getting close... Thanks for your help.
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/reference-data/issues/174#issuecomment-332261247, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX5wOvxtlBRH7BFimNaCISKBlEz7oDks5smSnvgaJpZM4PNgMB .
@dlebauer I tried again with 'Control BAP 2017' but get the same return 'Not found in treatments table'. Anything starting with Control gives the pop-down 'No matches'.
I've added some of the management activities but not all. I'll try to complete the management additions soon.
How about the name "BAP 2017"? I just added this one - perhaps try refreshing the browser?
I logged out and back in and tried again with both 'BAP 2017' and 'Control BAP 2017' and also 'Control' and still get 'Not found in treatments table. @dlebauer any other suggestions? Thanks for your help.
If I add a new treatment myself under the 2017 citation called 'Control BAP 2017' with Control=True would that work or would that make problems relating the sites to other data like sensor-derived data?
At this point the name doesn't matter - I'd be curious if it makes a difference if you make the treatment, but this is the one that I made: https://terraref.ncsa.illinois.edu/bety/treatments/6000000023 and you can feel free to rename it if that helps.
I successfully made it past the treatment designation and on to the next error during the Insert data step: PG::RaiseException: ERROR: The value of cultivar_id (6000000993) is not consistent with the value 6000000212 specified for site_id 6000005356. : INSERT INTO "traits" ("access_level", "checked", "citation_id", "created_at", "cultivar_id", "date", "date_day", "date_month", "date_year", "dateloc", "entity_id", "mean", "method_id", "n", "notes", "site_id", "specie_id", "stat", "statname", "time", "time_hour", "time_minute", "timeloc", "treatment_id", "updated_at", "user_id", "variable_id") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27) RETURNING "id"
Is it only one cultivar 6000000993 that is not matched with the correct site? Which cultivar and which site?
To get beyond the treatment specification I created 'Control BAP 2017'. It seems to make a difference that I'm the Record Creator: https://terraref.ncsa.illinois.edu/bety/treatments/6000000024
What is the cultivar name that you have in the site "MAC Field Scanner Season 4 Range 27 Column 2"?
PS for each of these you can identify which cultivar and site is being referred to by appending the id to the url with the table name, e.g.
The spreadsheet has 'MAC Field Scanner Season 4 Range 27 Column 2' site with cultivar 'PI641807'.
The other replicate of PI641807 is at 'MAC Field Scanner Season 4 Range 10 Column 2'. The common name for PI641807 is ATLAS, which is confusing and has been discussed in a couple different issues (see for example comment May 26 in https://github.com/terraref/reference-data/issues/126). PI641807 is correct for 2017 season 4. ATLAS also came up in one of the KSU github issues. ATLAS may have a different PI identity in KSU fields.
The other github issue that discusses ATLAS is #65 , see comments Aug 11 to 14. It may have been merged with an incorrect PI number (PI586537).
I've updated it. Not sure where the error came from, but we do have the following records for "ATLAS":
id | name | ecotype | notes |
---|---|---|---|
6000000212 | PI586537 | Sweet Sorghum | Commercial Sweet Sorghum Hybrid. Common name ATLAS |
6000000993 | PI641807 | BAP | Commercial Sweet Sorghum Hybrid. Common name ATLAS |
From that comment, https://github.com/terraref/reference-data/issues/126#issuecomment-304304105 it sounds like these should be kept separate since they are from different seed lots.
The file was successfully uploaded! Thanks very much. It looks like we're on track now! Next as soon as I have a chance I'll try uploading a file with subplots.
Sorry, apparently we're not quite on track yet. Uploading the second part of the same data set gives the error: PG::RaiseException: ERROR: The value of cultivar_id (6000000993) is not consistent with the value 6000000212 specified for site_id 6000005672. : INSERT INTO "traits" ("access_level", "checked", "citation_id", "created_at", "cultivar_id", "date", "date_day", "date_month", "date_year", "dateloc", "entity_id", "mean", "method_id", "n", "notes", "site_id", "specie_id", "stat", "statname", "time", "time_hour", "time_minute", "timeloc", "treatment_id", "updated_at", "user_id", "variable_id") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27) RETURNING "id"
I tried searching on site 6000005672 but get ' no entries found'
Can you link to the file you are trying to upload?
On Tue, Sep 26, 2017 at 5:03 PM MariaNewcomb
< mailto:MariaNewcomb notifications@github.com
wrote:
a, pre, code, a:link, body { word-wrap: break-word !important; }
Sorry, apparently we're not quite on track yet. Uploading the second part of the same data set gives the error:
PG::RaiseException: ERROR: The value of cultivar_id (6000000993) is not consistent with the value 6000000212 specified for site_id 6000005672. : INSERT INTO "traits" ("access_level", "checked", "citation_id", "created_at", "cultivar_id", "date", "date_day", "date_month", "date_year", "dateloc", "entity_id", "mean", "method_id", "n", "notes", "site_id", "specie_id", "stat", "statname", "time", "time_hour", "time_minute", "timeloc", "treatment_id", "updated_at", "user_id", "variable_id") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27) RETURNING "id"
I tried searching on site 6000005672 but get ' no entries found'
β
You are receiving this because you were mentioned.
Reply to this email directly, https://github.com/terraref/reference-data/issues/174#issuecomment-332349588 , or https://github.com/notifications/unsubscribe-auth/AAcX55YsgkcKas24uyx_x-3pyuWDoXRoks5smXSxgaJpZM4PNgMB .
There are 2 files at https://drive.google.com/drive/folders/0B1OR5-zeXagnMlpDQjBpRDlYU3M The one ending in _a was uploaded successfully. The one ending in _b gave the error message.
It looks like we have PI641807 in 4 sites in the field. All other cultivars should only be at 2 sites (2 reps). Looking back at the original entry list we received from Clemson, we received seed identified as ATLAS and seed identified as PI641807. We planted 2 reps of each and then later were told that the seed source of ATLAS sent to us is also identified PI641807.
@dlebauer or @gsrohde if you have any time today to update the problem records then I'll upload field measurement data this evening
@NewcombMaria What exactly is the error you get when attempting the _b
file upload? Does it happen on the validation page or right when you click the "Insert Data" button? Which citation where you using for the upload?
I get the error copied below, which is very similar to an error I originally had with the file ending in _a except different site_id. I think we tracked and fixed the problem that caused the error with the file ending in _a by changing 2 of the sites to PI641807. There may be 2 additional sites that need to be similarly changed. Let me know if this needs more explanation. Thanks!
PG::RaiseException: ERROR: The value of cultivar_id (6000000993) is not consistent with the value 6000000212 specified for site_id 6000005672. : INSERT INTO "traits" ("access_level", "checked", "citation_id", "created_at", "cultivar_id", "date", "date_day", "date_month", "date_year", "dateloc", "entity_id", "mean", "method_id", "n", "notes", "site_id", "specie_id", "stat", "statname", "time", "time_hour", "time_minute", "timeloc", "treatment_id", "updated_at", "user_id", "variable_id") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27) RETURNING "id"
@gsrohde the citation I'm using:
Newcomb, Maria | 2017 | Maricopa Agricultural Center Field Measurements and Activities
I get the error at the Insert Data step
This error occurs because site "MAC Field Scanner Season 4 Range 31 Column 13" (6000005672) is associated with cultivar "PI586537" (6000000212), but your upload file is trying to assign cultivar "PI641807" (6000000993) to a trait at that site.
So if the cultivar growing at site "MAC Field Scanner Season 4 Range 31 Column 13" really is "PI641807", then I'll correct the sites_cultivars
table.
Thanks @gsrohde . Yes the correct cultivar for 4 sites total is PI641807. Yesterday 2 of the sites were corrected which allowed the file ending in _a to successfully upload. 'MAC Field Scanner Season 4 Range 31 Column 13' and also 'MAC Field Scanner Season 4 Range 29 Column 8' should both be associated with cultivar "PI641807" (6000000993). Thanks very much!
@gsrohde or @dlebauer I tried again to upload the file Sorghum_season4_Heights_Fullfield_plots_b and am stopped with the same error: PG::RaiseException: ERROR: The value of cultivar_id (6000000993) is not consistent with the value 6000000212 specified for site_id 6000005672.
Have the records in the sites_cultivars table been revised? The correct association should be cultivar PI641807 cultivar_id (6000000993) at the following sites:
MAC Field Scanner Season 4 Range 27 Column 2
MAC Field Scanner Season 4 Range 27 Column 2 W
MAC Field Scanner Season 4 Range 27 Column 2 E
MAC Field Scanner Season 4 Range 31 Column 13
MAC Field Scanner Season 4 Range 31 Column 13 W
MAC Field Scanner Season 4 Range 31 Column 13 E
Let me know if/when the table has been updated and I'll try again.
Only "MAC Field Scanner Season 4 Range 31 Column 13" needed to be updated
select cultivar_id, sitename, site_id from sites_cultivars join sites on sites_cultivars.site_id = sites.id where sitename like 'MAC Field Scanner Season 4 Range 27 Column 2%' or sitename like 'MAC Field Scanner Season 4 Range 31 Column 13%'
update sites_cultivars set cultivar_id = 6000000993 where cultivar_id = 6000000212 and site_id = 6000005672
@NewcombMaria The file Sorghum season4_Heights_FullField_plots_b (1).csv is ready to be uploaded. It contains all of the data except a single row (see below).
@gsrohde please check why in Sorghum season4_Heights_FullField_plots_b (1).csv throws an error. The cultivars - sites relationship appears valid
select cultivar_id, sitename, name, site_id from sites_cultivars join sites on sites_cultivars.site_id = sites.id join cultivars on sites_cultivars.cultivar_id = cultivars.id where sitename like 'MAC Field Scanner Season 4 Range 42 Column 10'
returns
6000000806 MAC Field Scanner Season 4 Range 42 Column 10 PI329665 6000005886
I'll try uploading now. If all of the sites including plot and subplots (E/W) then the error might be that I had one cell with the text 'no data'. I'll let you know if the upload works. Thanks.
@dlebauer and @gsrohde I'm happy to report success with uploading Sorghum_season4_Heights_Fullfield_plots_b . Thanks for revising the sites-cultivar table! Next I'll upload height data sets that have sites designated at the subplot level (E/W rows)
@NewcombMaria per our conversation I deleted all of the records that you uploaded today (panicle_height and canopy_height)
delete from traits where user_id = 6000000017 and traits.created_at > CURRENT_DATE;
@dlebauer many thanks! I'll re-upload the data with the correct use of site/entity.
All the height data are now uploaded, along with stand count data from manual counts early in the season after plot thinning. I'm keeping a list of csv files successfully uploaded with date/time in a document in the directory https://drive.google.com/drive/folders/0B1OR5-zeXagnMlpDQjBpRDlYU3M
@dlebauer Be sure, when you use delete statements such as the above, that you keep in mind that traits.created_at > CURRENT_DATE
grabs all traits created today UTC. That will include traits created yesterday after 5PM Mountain Standard Time.
Thanks for the reminder Scott. I did check using select that I only got the records that Maria had uploaded.
Seed drop during planting (all plots) and seedling emergence counts and emergence rates over multiple dates from april 26 to may 9 (subset of plots) are now uploaded
I added field management activities for season4. Since there is not an option for selecting 'other' under 'Management Type' I was limited to options in the drop-down list. For plant thinning completed May 10, 11, and 12 the best fit for 'Management Type' seemed to be 'soil disturbance'.
@dlebauer and @gsrohde I'm ready to start uploading some of the field measurements from the 2017 MAC sorghum season (April - September). The cultivar-site table needs a couple revisions to the cultivar identities. As the plants approached their height potential we realized that 2 accessions (cultivars) were incorrectly identified, based on heights from Clemson and in GRIN. We don't know the correct identities. The appropriate cultivar name probably is 'Unknown' or something similar. The following 4 sites need to be re-assigned to 'Unknown' (or 'Unknown accession') instead of PI329440 and PI330807.
PI329440 MAC Field Scanner Season 4 Range 9 Column 14 PI329440 MAC Field Scanner Season 4 Range 32 Column 10 PI330807 MAC Field Scanner Season 4 Range 3 Column 5 PI330807 MAC Field Scanner Season 4 Range 44 Column 2
This issues overlaps with #126 . It's much easier to start with correct cultivar-site descriptions instead of having 2 sets of spreadsheets with and without revisions. Let me know if you need more info to make the changes.