rBatt / trawlData

Collate and clean bottom trawl survey data
10 stars 2 forks source link

BS-batch Genus error #15

Closed bselden closed 8 years ago

bselden commented 8 years ago

@rbatt

1 For those species with an NA conflict field, and a BS-batch flag (from the batch download I did from WORMS), the spp will show the accepted name. But if this is different from the species it matched in ref, the genus will still be the old genus.

Example: ref=BARBATIA DOMIGESIS species that was matched in the database (does not appear in file)=Barbatia domingensis spp=accepted name=Acar domingensis species=Acar domingensis genus=Barbatia

See http://www.marinespecies.org/aphia.php?p=taxdetails&id=582484

Will need to subset the data by the BS-batch flag, create a temporary genus column that is a split of spp, then run something along the lines of ifelse(genus.temp == genus, genus, genus.temp)

rBatt commented 8 years ago

Has this been corrected subsequently? OK, I looked into it and it has not been corrected.

FYI, here's how I fix:

# quoted expressions to hold subsetting logic
exp1 <- quote(!is.na(spp) & flag=="BS-batch" & (taxLvl%in%c("species", "genus", "subspecies")))
exp2 <- quote(sapply(strsplit(spp, " "), '[', 1)) # will be used twice

spp.key[eval(exp1) & (is.na(genus) | genus!=eval(exp2))] # show cases

spp.key[eval(exp1) & (is.na(genus) | genus!=eval(exp2)), genus:=eval(exp2)] # fix

spp.key[eval(exp1) & (is.na(genus) | genus!=eval(exp2))] # should show nothing

I found 41 instances. I think this takes care of problem, let me know otherwise. Will commit this fix on development and put pull request. That branch is accumulating fixes and features like crazy.

bselden commented 8 years ago

Hi Ryan, That's weird. I thought I did do this in my last edits to the spp.key file. Perhaps I missed some, but 41 seems like a lot. Is there any easy way to send me the 41 instances, just so I can spot-check that this code is doing what we want. (It looks like it should). Becca

On Fri, Dec 4, 2015 at 10:12 PM, Ryan Batt notifications@github.com wrote:

Has this been corrected subsequently? OK, I looked into it and it has not been corrected.

FYI, here's how I fix:

quoted expressions to hold subsetting logicexp1 <- quote(!is.na(spp) & flag=="BS-batch" & (taxLvl%in%c("species", "genus", "subspecies")))exp2 <- quote(sapply(strsplit(spp, " "), '[', 1)) # will be used twice

spp.key[eval(exp1) & (is.na(genus) | genus!=eval(exp2))] # show cases spp.key[eval(exp1) & (is.na(genus) | genus!=eval(exp2)), genus:=eval(exp2)] # fix spp.key[eval(exp1) & (is.na(genus) | genus!=eval(exp2))] # should show nothing

I found 41 instances. I think this takes care of problem, let me know otherwise. Will commit this fix on development and put pull request. That branch is accumulating fixes and features like crazy.

— Reply to this email directly or view it on GitHub https://github.com/rBatt/trawlData/issues/15#issuecomment-162135107.

Becca Selden PhD Student in Ecology, Evolution, and Marine Biology University of California, Santa Barbara selden@lifesci.ucsb.edu 320-339-0169 selden@lifesci.ucsb.edu

rBatt commented 8 years ago

If have to run the code again. I didn't save the changes.

On Saturday, December 5, 2015, bselden notifications@github.com wrote:

Hi Ryan, That's weird. I thought I did do this in my last edits to the spp.key file. Perhaps I missed some, but 41 seems like a lot. Is there any easy way to send me the 41 instances, just so I can spot-check that this code is doing what we want. (It looks like it should). Becca

On Fri, Dec 4, 2015 at 10:12 PM, Ryan Batt <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Has this been corrected subsequently? OK, I looked into it and it has not been corrected.

FYI, here's how I fix: r

quoted expressions to hold subsetting logicexp1 <- quote(!is.na(spp)

& flag=="BS-batch" & (taxLvl%in%c("species", "genus", "subspecies")))exp2 <- quote(sapply(strsplit(spp, " "), '[', 1)) # will be used twice spp.key[eval(exp1) & (is.na(genus) | genus!=eval(exp2))] # show cases spp.key[eval(exp1) & (is.na(genus) | genus!=eval(exp2)), genus:=eval(exp2)] # fix spp.key[eval(exp1) & (is.na(genus) | genus!=eval(exp2))] # should show nothing

I found 41 instances. I think this takes care of problem, let me know otherwise. Will commit this fix on development and put pull request. That branch is accumulating fixes and features like crazy.

— Reply to this email directly or view it on GitHub https://github.com/rBatt/trawlData/issues/15#issuecomment-162135107.

Becca Selden PhD Student in Ecology, Evolution, and Marine Biology University of California, Santa Barbara selden@lifesci.ucsb.edu javascript:_e(%7B%7D,'cvml','selden@lifesci.ucsb.edu'); 320-339-0169 <selden@lifesci.ucsb.edu javascript:_e(%7B%7D,'cvml','selden@lifesci.ucsb.edu');>

— Reply to this email directly or view it on GitHub https://github.com/rBatt/trawlData/issues/15#issuecomment-162213259.

rBatt commented 8 years ago

If I do library("trawlData"), then use exp1 and exp2 in the above code, then I can run/ get the following:

> spp.key[eval(exp1) & (is.na(genus) | genus!=eval(exp2)), list(ref, spp, common, taxLvl, species, genus)] # show cases
                           ref                      spp common  taxLvl                  species            genus
 1:         BARBATIA DOMIGESIS         Acar domingensis     NA species         Acar domingensis         Barbatia
 2:               GRIMATROCTES             Bathytroctes     NA   genus                       NA     Grimatroctes
 3:       GRIMATROCTES BULLISI  Bathytroctes microlepis     NA species  Bathytroctes microlepis     Grimatroctes
 4:    PARADIPLOGRAMMUS BAIRDI       Callionymus bairdi     NA species       Callionymus bairdi Paradiplogrammus
 5:          CALLISTA EUCYMATA        Callpita eucymata     NA species        Callpita eucymata         Callista
 6:  DACTYLOMETRA QUIQUECIRRHA  Chrysaora quinquecirrha     NA species  Chrysaora quinquecirrha     Dactylometra
 7:         BAIRDIELLA BATABAA         Corvula batabana     NA species         Corvula batabana       Bairdiella
 8:        NEMATONURUS ARMATUS   Coryphaenoides armatus     NA species   Coryphaenoides armatus      Nematonurus
 9: CORYTHOICHTHYS ALBIROSTRIS  Cosmocampus albirostris     NA species  Cosmocampus albirostris   Corythoichthys
10:           OSTREA PERMOLLIS    Cryptostrea permollis     NA species    Cryptostrea permollis           Ostrea
11:   PSEUDOCYPHOMA ITERMEDIUM      Cyphoma intermedium     NA species      Cyphoma intermedium    Pseudocyphoma
12:               RAJA CLARKII     Dactylobatus clarkii     NA species     Dactylobatus clarkii             Raja
13:                RAJA OREGOI         Dipturus oregoni     NA species         Dipturus oregoni             Raja
14:       GOBIOSOMA XATHIPRORA   Elacatinus xanthiprora     NA species   Elacatinus xanthiprora        Gobiosoma
15:       PODOCHELA GRACILIPES    Ericerodes gracilipes     NA species    Ericerodes gracilipes        Podochela
16:             CYPRAEA SPURCA          Erosaria spurca     NA species          Erosaria spurca          Cypraea
17:           OOCORYS BARTSCHI         Eucorys bartschi     NA species         Eucorys bartschi          Oocorys
18:           MUREX CELLULOSUS       Favartia cellulosa     NA species       Favartia cellulosa            Murex
19:   HYPSELODORIS EDETICULATA           Felimare picta     NA species           Felimare picta     Hypselodoris
20:           FUSIUS EUCOSMIUS        Fusinus excavatus     NA species        Fusinus excavatus            Fusus
21:         LATIRUS CARIIFERUS  Hemipolygona carinifera     NA species  Hemipolygona carinifera          Latirus
22:            LATIRUS MCGITYI    Hemipolygona mcgintyi     NA species    Hemipolygona mcgintyi          Latirus
23:            TEREBRA SALLEAA         Impages salleana     NA species         Impages salleana          Terebra
24:       OPLOPHORUS SPIICAUDA     Janicella spinicauda     NA species     Janicella spinicauda       Oplophorus
25:        TETRAPTURUS ALBIDUS           Kajikia albida     NA species           Kajikia albida      Tetrapturus
26:             RAJA LETIGIOSA    Leucoraja lentiginosa     NA species    Leucoraja lentiginosa             Raja
27:             LIMA PELLUCIDA        Limaria pellucida     NA species        Limaria pellucida             Lima
28:        SCALPELLUM GIGATEUM Litoscalpellum giganteum     NA species Litoscalpellum giganteum       Scalpellum
29:           LYREIDUS BAIRDII         Lysirude nitidus     NA species         Lysirude nitidus         Lyreidus
30:             CYPRAEA CERVUS      Macrocypraea cervus     NA species      Macrocypraea cervus          Cypraea
31:             TURBO CASTAEUS          Manzonia crassa     NA species          Manzonia crassa            Turbo
32:     MACROCALLISTA MACULATA     Megapitaria maculata     NA species     Megapitaria maculata    Macrocallista
33:      BATHYLAGUS BERICOIDES   Melanolagus bericoides     NA species   Melanolagus bericoides       Bathylagus
34:           CYMATIUM KREBSII         Monoplex krebsii     NA species         Monoplex krebsii         Cymatium
35:         MITHRAX ACUTICORIS      Nemausa acuticornis     NA species      Nemausa acuticornis          Mithrax
36:         ACTAEA RUFOPUCTATA   Paractaea rufopunctata     NA species   Paractaea rufopunctata           Actaea
37:        IOGLOSSUS CALLIURUS    Ptereleotris calliura     NA species    Ptereleotris calliura        Ioglossus
38:           TRIVIA PEDICULUS         Pusula pediculus     NA species         Pusula pediculus           Trivia
39:       UROLOPHUS JAMAICECIS     Urobatis jamaicensis     NA species     Urobatis jamaicensis        Urolophus
40:             MUREX CABRITTI     Vokesimurex cabritii     NA species     Vokesimurex cabritii            Murex
41:         YARELLA BLACKFORDI      Yarrella blackfordi     NA species      Yarrella blackfordi          Yarella
                           ref                      spp common  taxLvl                  species            genus
bselden commented 8 years ago

Yep, those were all ones that I changed in my last update to the taxonomy file. But, I was essentially doing what your code was doing, so that's fine. What's more concerning was that in that last update, I also changed several of the BS non-batch ones to make sure that those that had the same spp had all of the rest of the info the same (common name, taxonomy etc) if that species already existed in the list. Those won't be as easy a fix to write a function to incorporate.

The version of spp.key that's in my trawlData repo on my computer at home, attached below, does include the changes to the BS-batch genus (and all the other changes I made), dated 11/29/15. I pushed these changes (I thought anyway) with ecc8ab7 https://github.com/rBatt/trawlData/commit/ecc8ab7b0c1720eae3de42deeeb8641a2264b97d

When I push with git, does that not automatically update the files that are in the trawlData package?

Becca

2015-12-05 11:22 GMT-05:00 Ryan Batt notifications@github.com:

If I do library("trawlData"), then use exp1 and exp2 in the above code, then I can run/ get the following:

spp.key[eval(exp1) & (is.na(genus) | genus!=eval(exp2)), list(ref, spp, common, taxLvl, species, genus)] # show cases ref spp common taxLvl species genus 1: BARBATIA DOMIGESIS Acar domingensis NA species Acar domingensis Barbatia 2: GRIMATROCTES Bathytroctes NA genus NA Grimatroctes 3: GRIMATROCTES BULLISI Bathytroctes microlepis NA species Bathytroctes microlepis Grimatroctes 4: PARADIPLOGRAMMUS BAIRDI Callionymus bairdi NA species Callionymus bairdi Paradiplogrammus 5: CALLISTA EUCYMATA Callpita eucymata NA species Callpita eucymata Callista 6: DACTYLOMETRA QUIQUECIRRHA Chrysaora quinquecirrha NA species Chrysaora quinquecirrha Dactylometra 7: BAIRDIELLA BATABAA Corvula batabana NA species Corvula batabana Bairdiella 8: NEMATONURUS ARMATUS Coryphaenoides armatus NA species Coryphaenoides armatus Nematonurus 9: CORYTHOICHTHYS ALBIROSTRIS Cosmocampus albirostris NA species Cosmocampus albirostris Corythoichthys10: OSTREA PERMOLLIS Cryptostrea permollis NA species Cryptostrea permollis Ostrea11: PSEUDOCYPHOMA ITERMEDIUM Cyphoma intermedium NA species Cyphoma intermedium Pseudocyphoma12: RAJA CLARKII Dactylobatus clarkii NA species Dactylobatus clarkii Raja13: RAJA OREGOI Dipturus oregoni NA species Dipturus oregoni Raja14: GOBIOSOMA XATHIPRORA Elacatinus xanthiprora NA species Elacatinus xanthiprora Gobiosoma15: PODOCHELA GRACILIPES Ericerodes gracilipes NA species Ericerodes gracilipes Podochela16: CYPRAEA SPURCA Erosaria spurca NA species Erosaria spurca Cypraea17: OOCORYS BARTSCHI Eucorys bartschi NA species Eucorys bartschi Oocorys18: MUREX CELLULOSUS Favartia cellulosa NA species Favartia cellulosa Murex19: HYPSELODORIS EDETICULATA Felimare picta NA species Felimare picta Hypselodoris20: FUSIUS EUCOSMIUS Fusinus excavatus NA species Fusinus excavatus Fusus21: LATIRUS CARIIFERUS Hemipolygona carinifera NA species Hemipolygona carinifera Latirus22: LATIRUS MCGITYI Hemipolygona mcgintyi NA species Hemipolygona mcgintyi Latirus23: TEREBRA SALLEAA Impages salleana NA species Impages salleana Terebra24: OPLOPHORUS SPIICAUDA Janicella spinicauda NA species Janicella spinicauda Oplophorus25: TETRAPTURUS ALBIDUS Kajikia albida NA species Kajikia albida Tetrapturus26: RAJA LETIGIOSA Leucoraja lentiginosa NA species Leucoraja lentiginosa Raja27: LIMA PELLUCIDA Limaria pellucida NA species Limaria pellucida Lima28: SCALPELLUM GIGATEUM Litoscalpellum giganteum NA species Litoscalpellum giganteum Scalpellum29: LYREIDUS BAIRDII Lysirude nitidus NA species Lysirude nitidus Lyreidus30: CYPRAEA CERVUS Macrocypraea cervus NA species Macrocypraea cervus Cypraea31: TURBO CASTAEUS Manzonia crassa NA species Manzonia crassa Turbo32: MACROCALLISTA MACULATA Megapitaria maculata NA species Megapitaria maculata Macrocallista33: BATHYLAGUS BERICOIDES Melanolagus bericoides NA species Melanolagus bericoides Bathylagus34: CYMATIUM KREBSII Monoplex krebsii NA species Monoplex krebsii Cymatium35: MITHRAX ACUTICORIS Nemausa acuticornis NA species Nemausa acuticornis Mithrax36: ACTAEA RUFOPUCTATA Paractaea rufopunctata NA species Paractaea rufopunctata Actaea37: IOGLOSSUS CALLIURUS Ptereleotris calliura NA species Ptereleotris calliura Ioglossus38: TRIVIA PEDICULUS Pusula pediculus NA species Pusula pediculus Trivia39: UROLOPHUS JAMAICECIS Urobatis jamaicensis NA species Urobatis jamaicensis Urolophus40: MUREX CABRITTI Vokesimurex cabritii NA species Vokesimurex cabritii Murex41: YARELLA BLACKFORDI Yarrella blackfordi NA species Yarrella blackfordi Yarella ref spp common taxLvl species genus

— Reply to this email directly or view it on GitHub https://github.com/rBatt/trawlData/issues/15#issuecomment-162218011.

Becca Selden PhD Student in Ecology, Evolution, and Marine Biology University of California, Santa Barbara selden@lifesci.ucsb.edu 320-339-0169 selden@lifesci.ucsb.edu

rBatt commented 8 years ago

Well, pushing to the repo updates the .csv file on the repo. But that's different from the package. The package references a .RData file in another folder. I wrote a small function to read in the csv, convert characters to ASCII, and save the new .RData file for the package.

I haven't pushed my current version of the repo yet.

I'm going to do

git checkout ecc8ab7b0c1720eae3de42deeeb8641a2264b97d -- inst/extdata/taxonomy/spp.key.csv

That command will pull in your version of the file from Git history. Then I can update the spp.key and check for those genera again.

We'll figure it out. It's possible that I did something dumb last night, I was pretty tired by the end of it :stuck_out_tongue_winking_eye: The great part about Git though is that we don't have to worry about losing anything. It's all saved.

More updates soon.

rBatt commented 8 years ago

OK, I just did that, and this is the output:

> spp.key[eval(exp1) & (is.na(genus) | genus!=eval(exp2)), list(ref, spp, common, taxLvl, species, genus)] # show cases
                ref              spp common  taxLvl          species    genus
1: CYMATIUM KREBSII Monoplex krebsii     NA species Monoplex krebsii Cymatium

So it's saying you only missed 1. Which sounds reasonable?

I'll go ahead and make this fix directly on master, merge the changes into development, then pull development back into master.

After that, everything should the same everywhere, and everything should be up to date.

rBatt commented 8 years ago

I had merged the master versions of the .RData and .csv spp.key files into the development branch in a "take theirs" fashion using git checkout master --theirs -- inst/extdata/taxonomy/spp.key.csv and 1git checkout master --theirs -- data/spp.key.RData.

I am also doing library(remake); make() and library(devtools); document(); check(); unload(); install(); to get a clean install and update of the package.

rBatt commented 8 years ago

OK, this has been taken care of. All up to date.

Sorry it took a while, I walked away from my computer for a while while it reinstalled!

I'll pair the closure of this message with a release.