Blocking strategy with equal number of occupied sites per block ?

bpetitpi commented 1 year ago

Hello @rvalavi and thank you for this very useful package.

I am currently running some SDMs on many species, with very different types of distributions (rare, common, clustered, sparse...) and I was looking for a blocking strategy that can keep the same prevalence per fold (like the figure 4e and 4f in Roberts et al. 2017). For me, it seems to be the best way to compare models and also to avoid "empty partitions" (i.e. partitions without any presences).

If I am correct, such strategy isn't (yet?) implemented in blockCV, isnt'it ?

If this is not implemented, would you be aware of alternative tools that could split my folds spatially, while keeping the prevalence between presences and background ?

Many thanks if you can help me with this trick.

Blaise

rvalavi commented 1 year ago

Hi @bpetitpi

Thank you for your interest in blockCV. The strategy you are looking for is not yet implemented. I will look into it for the next version but it won't be very soon. An alternative solution is to use a suitable spatial block size in the blockCV::cv_spatial function and use random folds selection to find you the best possible balanced folds.

Alternatively, I recommend looking at ENMeval 2.0 package which is also designed for SDM evaluation. Also, mlr3spatiotempcv for evaluation models with mlr3 package and CAST for evaluation models with the caret package.

I hope this are helpful.

Cheers, Roozbeh

bpetitpi commented 1 year ago

Thank you very much for your quick reply and the insightful tips. Before the next version, I will work with a customized work-around.

Cheers, Blaise

rvalavi / blockCV

Blocking strategy with equal number of occupied sites per block ? #34