Closed wcornwell closed 4 years ago
i will do GBIF
Will get something for body size - assigned to @itchyshin - but @jessicatytam please try to find the latest paper with the biggest dataset and put it here
I will do phylogeny on rotl L
On Wed, Sep 16, 2020 at 12:19 PM Shinichi Nakagawa notifications@github.com wrote:
Will get something for body size - assigned to @itchyshin https://github.com/itchyshin - but @jessicatytam https://github.com/jessicatytam please try to find the latest paper with the biggest dataset and put it here
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jessicatytam/honours/issues/2#issuecomment-693128376, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDIDL7OWHMQSO2BJJFJL5TSGAN7FANCNFSM4ROAYOCQ .
metadata for gbif download:
processing steps:
library(data.table)
library(dplyr)
library(readr)
z<-fread("data/0062362-200613084148143.csv")
z2<-select(z,order,family,genus,species,scientificName,decimalLatitude,decimalLongitude,year,issue)
rm(z)
gc()
z3<-filter(z2,!grepl("COUNTRY_COORDINATE_MISMATCH",issue)&
!grepl("ZERO_COORDINATE",issue)&
!grepl("COORDINATE_INVALID",issue)&
!grepl("COUNTRY_MISMATCH",issue)&
!grepl("COORDINATE_OUT_OF_RANGE",issue))
#excludes 300,000 records
z4<-select(z3,scientificName,decimalLatitude,decimalLongitude,year)
write_csv(z4,"data/gbif_processed.csv")
file is too big to add to github but you can download it here: https://www.dropbox.com/s/6lv44ap17v5amy1/gbif_processed.csv.zip?dl=0
@jessicatytam please check if this is too big to read in on your computer--unzipped it's about 970MB. if it is, then i can split into pieces.
be very interesting to see if h-index corresponds to number of records in gbif. that would be a result already
there are still extinct things in that dataset...not sure how to exclude them at this point....
Body mass databases
PanTHERIA (http://esapubs.org/archive/ecol/E090/184/#data)
AnAge database (https://genomics.senescence.info/species/browser.php?type=2&name=Mammalia)
Smith et al. (https://knb.ecoinformatics.org/view/doi:10.5063/AA/nceas.196.3)
Quaardvark (https://animaldiversity.ummz.umich.edu/quaardvark/search/)
@jessicatytam maybe this one too - can you check this out too?
@itchyshin looks like it only has some generic description to the whole group instead of the actual numbers
They do have specific info - for example, koala - and you can scrape info from an underlying database for this website
https://animaldiversity.org/accounts/Phascolarctos_cinereus/
Range mass 5.1 to 11.8 kg 11.23 to 25.99 lb
ohh ok i see, i'll find out if there is a way to download that in bulk, thanks!
library(rotl) library(ape)
taxa <- tnrs_match_names("Mammalia") #find iTOL record for Mammalia res <- tol_subtree(ott_id = taxa$ott_id, label_format = "name") #extract subtree of mammals str(res)
res$tip.label[1000:2000] res$tip.label <- gsub("_"," ", res$tip.label) #get rid of the underscores res$tip.label[1000:2000]
hist(lengths(gregexpr("\W+", res$tip.label)) + 1, xlab = "number of words in species names", main="mammalian tree tip labels") #histogram of how many words are in the species names from the tree table((lengths(gregexpr("\W+", res$tip.label)) + 1)) #table for the above
plot(res, labels=FALSE)
closing in favor of #3 and #4
[x] phylogeny
[ ] body size data
[x] GBIF
[ ] taxonomic synonym matching