hurlbertlab / core-transient

Data and code for NSF funded research on core vs transient species
7 stars 3 forks source link

scale analysis - richnessYearsTest #99

Closed ssnell6 closed 7 years ago

ssnell6 commented 8 years ago

I'm trying to get the richnessYearsTest run on its own without having to load in everything from the cleaning scripts (datasets 1-7), but the richnessYearSubsetFun calls dataset and site (not formatted in the original dataset, so calling it without the formatting is probably not useful)

richnessYearSubsetFun read.table(text = as.character(dataset$site), sep = "_", quote = "\"", stringsAsFactors = F) at core-transient_functions.R#196 getNestedSiteDataset(dataset, siteGrain, dataDescription) at core-transient_functions.R#244 getNestedDataset(dataset, spatialGrain, temporalGrain, dataDescription) at core-transient_functions.R#254 richnessYearSubsetFun(dataset7, spatialGrain = sGrain, temporalGrain = tGrain, minNTime = minNTime, minSpRich = minSpRich, dataDescription)

I don't want to edit the source functions, but I'm not sure how to get around this without calling all the formatting intermediate datasets for each datasetID. I tested out each spatial grain within its own cleaning script and each level of spatial grain ran without a problem, so I think the issue is with trying to isolate richnessYearsTest for a for loop function.

ahhurlbert commented 8 years ago

dataset7 is the formatted dataset. Instead of running the entire cleaning script to get it, you can just read it in from the formatted_datasets folder.

Then feed into richnessYearSubsetFun(). No need to edit source functions or re-run earlier parts of the cleaning script.

The fact that in the code that defines richnessYearSubsetFun it uses "dataset" is irrelevant. That does not refer to the "dataset" object in the beginning of a cleaning script. That name is only relevant within the function definition itself, and could in theory be called anything.

So,

dataset7 [or whatever you want to call it] = read.csv('formatted_datasets/datasetX.csv') richnessYearTest(dataset7) or test = richnessYearSubsetFun(dataset7) etc...

ssnell6 commented 8 years ago

That is the script I currently have, but I am getting an error when I run richnessYearsTest on it, and no error when I try running the cleaning script (I clear the history in between these tests). The error traceback led me to think it was due to the function, but I guess there is something else going on.

I'll look at my script again and post the error tomorrow if that will help.

Edit - script is in repo here: scripts/R-scripts/data_cleaning_scripts/scale_analysis_CT.R

ssnell6 commented 8 years ago

This is the error. Based on google, the error could be due to an outlier? I thought it had to do with the original dataset because of the lines 4-5 in the traceback, but now I'm not sure.

Error in format.default(structure(as.character(x), names = names(x), dim = dim(x), : invalid 'trim' argument Traceback: 7 format.default(structure(as.character(x), names = names(x), dim = dim(x), dimnames = dimnames(x)), ...) 6 format(structure(as.character(x), names = names(x), dim = dim(x), dimnames = dimnames(x)), ...) 5 format.factor(dataset$date, "%Y") 4 format(dataset$date, "%Y") at core-transient_functions.R#233 3 getNestedTimeDataset(datasetSpace, temporalGrain, dataDescription) at core-transient_functions.R#245 2 getNestedDataset(dataset, spatialGrain, temporalGrain, dataDescription) at core-transient_functions.R#254 1 richnessYearSubsetFun(dataset7, spatialGrain = sGrain, temporalGrain = tGrain, minNTime = minNTime, minSpRich = minSpRich, dataDescription)

ssnell6 commented 8 years ago

I get the same error when I try to run the cleaning script from line 689, which is supposed to work without running the prior code. Do we know for sure that the script ran properly before?

ssnell6 commented 8 years ago

I have a conceptual question in addition to the above error issue that is still unresolved - when trying to loop through the different spatial grains, I tried using the raw site unit character name like the cleaning script does. However, I'm not sure it's actually calculating richnessYearsTest with each grain, since the site column is already concatenated in dataset7. I may not understand what richnessYearsTest is supposed to do, but it looks like the function won't be able to distinguish specific grains based on what they were originally called if the site is already combined in the dataset?

ahhurlbert commented 8 years ago

I think you're right that richnessYearsTest may simply check that there is sufficient data (# species, # years) at the coarsest spatial scale.

If that's the case, we may have to effectively "throw out" the coarser grain info (temporarily within the loop) before running that test.

For a number of datasets, the finer scales may not have sufficient data, in which case even though they are hierarchical datasets and suitable in theory, they are not usable in practice.

If you're not sure how to proceed with this you can wait until I get back.

ssnell6 commented 8 years ago

Fixed problem 1 with changing date format, but still need to figure out how to successfully isolate each grain and run each separately through the loop.

ssnell6 commented 8 years ago

This code is messing up the dates with just year values and I can't figure out why. It says the number of characters are > 4 but it's just years (says its 5 characters). Also the 4 character dates are getting wiped with the command and changed to NA when the second line is run.

if (nchar(as.character(dataset7$date)[1] > 4)) { dataset7$date = as.POSIXct(strptime(as.character(dataset7$date), format = "%Y-%m-%d")) }

ahhurlbert commented 7 years ago

@ssnell6 has this been resolved? If so, we can close the issue.