hurlbertlab / dietdatabase

Creative Commons Zero v1.0 Universal
10 stars 9 forks source link

Error when trying to get speciesSummary for American Black Duck #48

Closed pwinner1 closed 7 years ago

pwinner1 commented 7 years ago

When I type (in RStudio):

speciesSummary("American Black Duck", diet, by = "Order")

I get this error message:

Error in if (x[level] == "") { : missing value where TRUE/FALSE needed

ahhurlbert commented 7 years ago

This was erroring because values were NA instead of blank, but that should now be fixed. Feel free to re-open this issue if you are still having problems.

pwinner1 commented 7 years ago

I seem to have encountered a similar problem again when I type:

amwig = speciesSummary("American Wigeon", diet, by = "Order")

This is the error I get:

Warning messages: 1: In max(which(x != "")[which(x != "") < level], na.rm = T) : no non-missing arguments to max; returning -Inf 2: In max(which(x != "")[which(x != "") < level], na.rm = T) : no non-missing arguments to max; returning -Inf 3: In max(which(x != "")[which(x != "") < level], na.rm = T) : no non-missing arguments to max; returning -Inf 4: In max(which(x != "")[which(x != "") < level], na.rm = T) : no non-missing arguments to max; returning -Inf 5: In max(which(x != "")[which(x != "") < level], na.rm = T) : no non-missing arguments to max; returning -Inf 6: In max(which(x != "")[which(x != "") < level], na.rm = T) : no non-missing arguments to max; returning -Inf 7: In max(which(x != "")[which(x != "") < level], na.rm = T) : no non-missing arguments to max; returning -Inf 8: In max(which(x != "")[which(x != "") < level], na.rm = T) : no non-missing arguments to max; returning -Inf

ahhurlbert commented 7 years ago

This looks like it is occurring whenever there is no taxonomic information about the prey, even as far as Kingdom. These records where even Prey_Kingdom is NA should be checked and modified or fixed when necessary. (Presumably the prey belong to SOME kingdom, and if not because it is non-biological material like rocks or sand, then it should be excluded and fractions should be re-calculated.)

ahhurlbert commented 7 years ago

Here's a list of the number of records with NA's for Prey_Kingdom by species:

                  Var1 Freq

American Wigeon 8 Boat-tailed Grackle 5 Brown Teal 1 Common Eider 4 Common Eider (Northern) 1 Common Gallinule 5 Common Shelduck 2 Crested Duck 1 Falkland Steamer-Duck 1 Gadwall 1 Greater Scaup 2 Kelp Goose 5 King Eider 1 New Zealand Scaup 2 Northern Shoveler 3 Red-winged Blackbird 4 Redhead 1 Steller's Eider 2 Tundra Swan (Whistling) 1

@aaronolsen Most of these are Anseriformes, do you have a sense off the top of your head about records in which NA's were assigned to all prey taxonomic levels?

aaronolsen commented 7 years ago

I used the 'taxize' R package to get the NCBI taxonomic levels for each prey item. For a small number of searches it appears that NCBI doesn't provide a kingdom. For example, if you search for the genus 'Colpomenia' (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=27964&lvl=3&keep=1&srchmode=1&unlock) it skips from "Eukaryota" (superkingdom) to "Phaeophyceae" (phylum) -- at least in the output that's interpreted by taxize. 'Colpomenia' is a prey item I had listed for Kelp Goose. btw, in github do I need to include your handle in every message directed to you so that you get a notification about it? @ahhurlbert

ahhurlbert commented 7 years ago

Ok, that makes sense and we may need to rethink adding a "Domain" or "Superkingdom" level above kingdom to accurately represent the taxonomy of some of these items. I will post another issue.

However, the 'Colpomenia' example is not the source of our errors above because the prey has been classified at lower levels (e.g., with the genus name, but also family, order, etc) so any summary of prey will only error when trying to summarize at the Kingdom level, which will not be a typical use case.

More problematic are records where the prey taxonomy was not specified at ANY level, i.e. everything from Kingdom through Scientific name is all NA as in 3 records of American wigeon from Wishart 1983. For some records, the Prey_Part is listed as 'vegetation' so we can infer plant, but records with NA all the way across? Ideas?

(And depending on your own personal Github settings, you will receive notifications only when someone tags you, or anytime a repo you star is modified, etc)

aaronolsen commented 7 years ago

@ahhurlbert Ah, sorry misread that. So I tracked down every row of NA values for all the taxonomy columns and each corresponded to one of two prey items: algae or foraminifera. Since these didn't have clear entries for the given taxonomy levels on NCBI I had planned to enter NA values for all the taxonomy columns and then enter the common name 'algae' or 'foraminifera' in Prey_Common_Name. However, I must have forgotten to add in the common name values. I've gone through and added in those common names. Would you like for me to enter particular taxonomy values for those rows or delete them?

ahhurlbert commented 7 years ago

Great, thanks! Don't delete. We'll figure out what we're doing with adding a level above kingdom, and then also figure out how we want to assign algae and forams. Forams at least have a clear phylum I believe.

ahhurlbert commented 7 years ago

For now I've assigned Prey_Phylum = 'Foraminifera' for all foraminifera and Prey_Kingdom = 'Plantae' for all algae (using ITIS kingdoms of 'Animalia' and 'Plantae' instead of NCBI's 'Metazoa' and 'Viridiplantae'; those have all been replaced).

Now, there are no records with NA across the board for prey taxonomy fields.

ahhurlbert commented 7 years ago

Following a talk with Alan Weakley, plant systematist/taxonomist, I've decided to go with ITIS over NCBI for all plant taxonomy. Ideally will use ITIS for animals too, but haven't yet confirmed how good an idea this is based on the areas of disagreement at higher levels.

ahhurlbert commented 7 years ago

Taxonomy of all prey names has been converted to ITIS (see prey_name_cleaning.R), and now Prey_Name_Status will contain the ITIS id for that taxon. If the name is not matched in ITIS, the original name will be preserved although ideally everything from the next higher level up and above should conform to ITIS classification. In such cases, Prey_Name_Status will be 'unverified'.