Trait Data cleanup - Githubissues

javirudolph commented 5 years ago

From the same csv file that is used in master_cleanup.R

Some had NAs

[ ] Fix male vs female traits with blank values? Or just use NAs.
[ ] Scale all variables: mean = 0, st = 1
[ ] May 30 data?
[ ] No duplicates

javirudolph commented 5 years ago

I forgot to ask something about the Leaf data. If you run lines 69-90 in the master_cleanup.R you will get the dataframe I'm working with. It should be organized by family and sample ID so you can get the duplicates.

The growth and development data is duplicated, but the Leaf data is not and I don't know if these should be averaged.

I'm attaching a screenshot SharedScreenshot

javirudolph commented 5 years ago

Another issue: Samples c("P_18_1_6_B", "P_6_6_20") have a missmatch with sexes and not sure what to do about it. @Kollarlm any thoughts? This is visualized in lines 53-55 of the script trait_cleanup.R

javirudolph commented 5 years ago

From the same script, please check for samples "P_11_16_20" "P_18_1_1_B" since they have everything the same except for the values in Day 21... It's probably an error in decimal points. @Kollarlm

Kollarlm commented 5 years ago

Hi Javi!

Yes, they should be .33 and not 33. It is the decimal point that is off.

Les

Leslie M. Kollar

Ph.D. Candidate

Department of Biology

University of Florida

Twitter: @Kollar_Genetics

Pronouns: She/her/hers

From: Javiera Rudolph notifications@github.com Sent: Wednesday, June 5, 2019 3:16:39 PM To: javirudolph/mossmat Cc: Kollar,Leslie M; Mention Subject: Re: [javirudolph/mossmat] Trait Data cleanup (#5)

From the same script, please check for samples "P_11_16_20" "P_18_1_1_B" since they have everything the same except for the values in Day 21... It's probably an error in decimal points. @Kollarlmhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Kollarlm&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=XBGf66T2ZPyLIsd9UZy7Vw&m=bNHRrRVRoEnjA7I5y1azMBkvANhpSB5fQaY8tbfVaUs&s=F3lhwhwfrAZy1sun9pmcEOYa7nFwEtQd-Iz3AR8VGlM&e=

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_javirudolph_mossmat_issues_5-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DACP27O5MXZ324T72WSUMUGLPZAGJPA5CNFSM4HSKQKLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXAXO5Q-23issuecomment-2D499218294&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=XBGf66T2ZPyLIsd9UZy7Vw&m=bNHRrRVRoEnjA7I5y1azMBkvANhpSB5fQaY8tbfVaUs&s=1YC60BYSA8rmiBwd1uHkOIeZCNpgRfrFl_WiM-6uHe8&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACP27O3Y32V5RKUBEJ7OOJDPZAGJPANCNFSM4HSKQKLA&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=XBGf66T2ZPyLIsd9UZy7Vw&m=bNHRrRVRoEnjA7I5y1azMBkvANhpSB5fQaY8tbfVaUs&s=psory93IG-BUYMg4Mu-iJJ16Av-9BQ_aXDmxi10sVxk&e=.

Kollarlm commented 5 years ago

P_18_1_6_B should be male for both. I would remove the Avg_arch information from that cell. It was never marked as both and therefore I think it was information accidentally added to the wrong cell. I have tripled checked through students notebooks too. As for P_6_6_20 I cannot get a clear answer after reviewing both my notes and a students. Given that we have plenty of individuals for that family, I think we should remove that from the sample set.

Leslie M. Kollar

Ph.D. Candidate

Department of Biology

University of Florida

Twitter: @Kollar_Genetics

Pronouns: She/her/hers

From: Javiera Rudolph notifications@github.com Sent: Wednesday, June 5, 2019 3:03:14 PM To: javirudolph/mossmat Cc: Kollar,Leslie M; Mention Subject: Re: [javirudolph/mossmat] Trait Data cleanup (#5)

Another issue: Samples c("P_18_1_6_B", "P_6_6_20") have a missmatch with sexes and not sure what to do about it. @Kollarlmhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Kollarlm&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=XBGf66T2ZPyLIsd9UZy7Vw&m=6P-MzeFqfYl8LAzc10_YCkrXcErE7-eeGxANC5If6_U&s=C4wcZynMY1o78jSt4K4YP9XU_2rB3plXnPgcP8ixB1w&e= any thoughts? This is visualized in lines 53-55 of the script trait_cleanup.Rhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_javirudolph_mossmat_blob_master_R_trait-5Fcleanup.R&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=XBGf66T2ZPyLIsd9UZy7Vw&m=6P-MzeFqfYl8LAzc10_YCkrXcErE7-eeGxANC5If6_U&s=LLx_W39LZ30PPHQsGfnk6gZD49YlGqFbq4mUUMV_h0c&e=

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_javirudolph_mossmat_issues_5-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DACP27O2XVGP6VXKKZZUMGSTPZAEXFA5CNFSM4HSKQKLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXAWMRA-23issuecomment-2D499213892&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=XBGf66T2ZPyLIsd9UZy7Vw&m=6P-MzeFqfYl8LAzc10_YCkrXcErE7-eeGxANC5If6_U&s=r8lUFP-y6HL6nTnVsFGVFENVundXIUAgiZ46SijrOFQ&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACP27O5ZBIHLQEDQDDLN6ADPZAEXFANCNFSM4HSKQKLA&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=XBGf66T2ZPyLIsd9UZy7Vw&m=6P-MzeFqfYl8LAzc10_YCkrXcErE7-eeGxANC5If6_U&s=PzESrWRZfAcQFuzpArot7TeSQkSF8HJaY9rZaf7AOQs&e=.

javirudolph commented 5 years ago

Awesome! I'll fix that now. I'm working on the trait_cleanup.R script

Kollarlm commented 5 years ago

I forgot to ask something about the Leaf data. If you run lines 69-90 in the master_cleanup.R you will get the dataframe I'm working with. It should be organized by family and sample ID so you can get the duplicates.

The growth and development data is duplicated, but the Leaf data is not and I don't know if these should be averaged.

I'm attaching a screenshot

The leaf area is should not be the same for each of the clones because we used the EXACT same plants as we did for the VOCs. We can average the clones now since we want to remove duplicates. We definitely did a very good sampling of leaves from each individual. The three leaves for each individual were averaged to get a single value for a single clone.

For the growth and development experiment we were able to use three clones (we had the space in the growth chamber so why not) and since they were two separate experiments (yet on the same genotypes/individuals) we chose to average the clones (3) for the growth and developmental data to fit with the two clones in the VOC/Leaf data. Also, we should have significantly less evironmental error in the growth/dev data because it was in a growth chamber and plants were rotated everyday versus the inconsistencies we see in a greenhouse (where the voc plants were planted). This is probably more info than you need.!

javirudolph / mossmat

Trait Data cleanup #5