paigemiller / culex-SDM

Paige Miller, Robert Richards, Gio Righi -- ECOL 8910 class project (2016)
0 stars 0 forks source link

Bias vs. Lower Sample Size issues #4

Open rlrichards opened 8 years ago

rlrichards commented 8 years ago

It's seeming that (at least with the cluster style sampling we discussed last time) though the biasing of the data set definitely lowers the AUC on testing data it doesn't seem to lower it any more (and sometimes less) than simply randomly subsetting the data. Thus the AUC decrease seems largely due to decreased sample size. Currently I'm working with LOBAG-OC and we know from John and my work that sample size has a substantial effect on its performance, so I guess this was to be expected. Not sure exactly where to go from here with that. I'll try MaxEnt tomorrow (when fitting it with the swiss-veg data just feed it all the presence and absence points of the training set as background), and we can hope that it works a little better.

P.S. I'm already working with only the 8 most abundant species.

rlrichards commented 8 years ago

MaxEnt seems to have the same problem :-(

paigemiller commented 8 years ago

http://onlinelibrary.wiley.com/doi/10.1111/ddi.12096/abstract

what is this idea of background manipulation? Could we apply it to the alps data? Or should we search for data over a larger geographic area?

On Sun, Mar 20, 2016 at 2:36 PM, rlrichards notifications@github.com wrote:

MaxEnt seems to have the same problem :-(

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/paigemiller/culex-SDM/issues/4#issuecomment-198984651

rlrichards commented 8 years ago

It's tested in the Foucade paper in the wiki. It generally performs very poorly in that paper. Basically you restrict you background points in a similar way to the bias in your presence points.

It's another option for modeling biased distributions but is ultimately still a victim of the problem that I'm getting AUCs for biased data similar to AUCs for randomly subsampled data. If the cause of our decrease in AUC is mostly due to sampling size then that's not something that we can expect any of our methods to address.

If you guys are having similar issues to mine then we might have to look elsewhere....but I could easily be doing something dumb (my code is on my branch spatialbias.R).

paigemiller commented 8 years ago

Could the issues you're having be because the geographic region for the alps species is too small?

On Mon, Mar 21, 2016 at 2:00 PM, rlrichards notifications@github.com wrote:

It's tested in the Foucade paper in the wiki. It generally performs very poorly in that paper. Basically you restrict you background points in a similar way to the bias in your presence points.

It's another option for modeling biased distributions but is ultimately still a victim of the problem that I'm getting AUCs for biased data similar to AUCs for randomly subsampled data. If the cause of our decrease in AUC is mostly due to sampling size then that's not something that we can expect any of our methods to address.

If you guys are having similar issues to mine then we might have to look elsewhere....but I could easily be doing something dumb (my code is on my branch spatialbias.R).

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/paigemiller/culex-SDM/issues/4#issuecomment-199403873

rlrichards commented 8 years ago

Possibly? I can kind of argue that both ways in my head.