Open lwaldron opened 1 year ago
I think these come from the output of bacdiveR. @jwokaty, I've been using this spreadsheet, is there a newer version? Those from "biosafety level" seem to be incorrect parsing. I'll add the remaining values to the extdata/attributes.tsv file.
@sdgamboa I've created a new spreadsheet and it seems that the biosafety level, country, and geographic location appear to be formatted correctly; however, I have not yet replaced the BacDive sheet yet. I wanted to give you the opportunity to look at it first: https://docs.google.com/spreadsheets/d/1P4Ic6-N9GVXcX1CdfoamFt6eozfHqt-sxfIRTBvYHWk/edit?usp=sharing. If it looks good, I want to upload it as a new version to the BacDive document.
@jwokaty, thanks! Values for biosafety level seem fine now and I no longer get 'X' columns when parsing the file. I added the url to this code: https://github.com/waldronlab/bugphyzz/blob/ed8b40fe21bb2da00e10a8b9c0405d36b5036cf2/R/bacdive.R#L21-L29. Please let me known if I new URL is needed or if you overwrite the previous spreadsheet.
library(bugphyzz)
bl <- physiologies('biosafety level')[[1]]
#> Finished biosafety level.
#> Warning: Missing columns in biosafety level. Missing columns are: Genome_ID,
#> Accession_ID
unique(bl$Attribute)
#> [1] "biosafety level 1" "biosafety level 2" "biosafety level 3"
#> [4] "biosafety level 1+" "biosafety level 3**" "biosafety level L1"
Created on 2023-09-20 with reprex v2.0.2
@sdgamboa I'm glad that it's working better. I think that we should use the original URL as we can make use of Google Sheet versioning. It only keeps a version history of 30 days but it will allow us to upload a new version without changing the URL in bugphyzz.
@jwokaty, agreed. I'll switch back to the original URL when the spreadsheet gets updated.
I've updated the google sheet!
The following line in bugphyzzExports is identifying invalid values and dropping them. @sdgamboa please raise such curation issues here and discuss whether they should be resolved by correcting the invalid values, adding to the allowed vocabulary, or continuing to drop these values. For some, dropping certainly does seem like the right choice for ASR, but for others (like aerophilicity and shapes) I'm not so sure.
https://github.com/waldronlab/bugphyzzExports/blob/a9fc18914cb3b1d9ea3a3d1c0121ccac5c8d482a/inst/scripts/export_bugphyzz.R#L126