zoonproject / zoon

The zoon R package
Other
61 stars 13 forks source link

How should preset crossvalidation be dealt with. #407

Closed timcdlucas closed 6 years ago

timcdlucas commented 6 years ago

In the CarolinaWren data there's a whole seperate module for the validation dataset. This seems like it'll spawn a lot of additional modules. Is it easy for an occurrence module to just load fold information in one go?

This is prompted by look at marinespeed that has four predefined crossvalidation schemes. Certainly don't want four modules, one for each. I guess `marinespeed_validation(type = 'disc') isn't a terrible solution.

Finally, I guess LocalOccurrenceDataFrame should be able to accept a dataframe with a folds column? And perhaps even Crossvalidate be able to accept a vector of folds as well. I'm not exactly sure.

Sorry for the rambling and lack of action...

Doi90 commented 6 years ago

I don't see an issue with the Carolina Wren stuff having an additional validation module as it is mainly there for use as a test dataset.

BothLocalOccurrenceData and LocalOccurrenceDataFrame did (and will again when my pull request gets through Travis) have the ability to define the folds before loading the data. They retain any additional columns supplied (beyond lat, long, value, type etc), so if there is a column in the .csv/data.frame called fold the module wont create one and fill it with 0 or 1 (depending on the externalValidation argument).

If/when marinespeed gets adapted as an occurrence module then I think marinespeed_validation(type = 'disc') is a perfectly acceptable usage. If we want to keep the cross-validation steps in the Process modules and out of Occurrence then I assume we could split it into two modules marinespeedData and marinespeedCV (but I say this without knowing how marinespeed works under the hood). If we keep it all in one marinespeed module then it should have the option for type = NULL so it can use the other crossvalidation modules if desired.

AugustT commented 6 years ago

My understanding was that the occurrence modules, including the local data one, can specify the crossvalidation in a column called fold (or similar). I agree with @doi90, I think marinevalidation(type='disc') is fine

timcdlucas commented 6 years ago

it should have the option for type = NULL Absolutely. Fitting to the full dataset at the end would require this as well.

So this is all fine as is then. The only unneatness is that crossvalidation can be defined in occurrence or process. But it still makes sense; creating the cross validation folds is a process, loading predefined crossvalidation is part not. So that's fine.

Cheers for the clarification both.