urol-e5 / timeseries

Data generated from e5 time series sampling in Moorea
2 stars 0 forks source link

ITS2 sample metadata confirmation #59

Open hputnam opened 5 months ago

hputnam commented 5 months ago

@AHuffmyer the ITS2 code has for each species this comment

Acropora samples with no site info -- whatʻs up with these? typo in sample name?

Porites samples with no site info -- whatʻs up with these? typo in sample name?

Pocillopora samples with no site info -- whatʻs up with these? typo in sample name?

Has this been QC's and resolved, or does it still need to be dealt with?

AHuffmyer commented 5 months ago

In the ITS2_analysis.Rmd I added code to change incorrect colony ID's from the original data set. Here is that code:

# colonies without site/species metadata were typos or incorrect labeling. AH has manually changed them here 
# AH has reconciled 6 of the 11 missing colony ID's

sam0 <- sam0 %>%
  mutate(colony_id=if_else(colony_id=="POR_44", "POC_44", 
                           if_else(colony_id=="POR_369", "POC_369", 
                           if_else(colony_id=="POC_83", "POR_83", 
                           if_else(colony_id=="POC_240", "POR_240", 
                           if_else(colony_id=="ACR_398", "ACR_396", 
                           if_else(colony_id=="ACR_314", "ACR_374", colony_id)))))))

# LEFT TO IDENTIFY
# POR-352
# POC-295
# POC-25
# POC-234
# ACR-268

In short, there were 11 colonies in the ITS2 dataset that did not match any known colony ID. I went through metadata and tried to track down if these were typos or incorrectly labeled. I was able to figure out 6 of the 11 and corrected them in this code. The remaining 6 I was unable to identify with any confidence. See the list of those 6 above in the code chunk. We can discuss how we would like to move forward with this.