marlonecobos / kuenm

kuenm: An R package for detailed calibration and construction of Maxent Ecological Niche Models.
60 stars 24 forks source link

kuenm_cal_swd does not produce omission rate statistics #17

Open taprs opened 3 years ago

taprs commented 3 years ago

Hi, I am once again struggling to make friends with kuenm_cal_swd. This time I could not retrieve any omission rate statistics with it.

My SWD files look like this:

Occurrence dataset (all of them look like this):

species,lon,lat,bio02,bio03,bio10,bio11,bio15,bio18,bio19
Vaccinium_myrtillus,37.633,54.867,70,202,185,-77,31,235,98
Vaccinium_myrtillus,-115.39306,40.5925,114,309,141,-96,32,58,141
Vaccinium_myrtillus,37.563163,55.574375,68,199,182,-79,29,229,106
Vaccinium_myrtillus,16.561161,67.96668,47,184,137,-59,28,247,251

Bias files (two of them in the folder):

background,lon,lat,bio02,bio03,bio10,bio11,bio15,bio18,bio19
background,-101.67930586895,47.01236056295,101,239,208,-95,68,162,31
background,-101.68763920225,38.68736059625,122,313,245,-11,65,194,27
background,-117.76263913795,49.62902721915,93,285,158,-68,28,151,313
background,-93.60430590125,39.71236059215,92,245,253,-16,39,276,98

My command is the following: kuenm_cal_swd('Vaccinium_myrtillus_joint_swd.csv', 'Vaccinium_myrtillus_train_swd.csv', 'Vaccinium_myrtillus_test_swd.csv', './background', 'kuenm_cal_swd.sh', 'vm_mod', c(seq(0.1, 1, 0.1), seq(2, 6, 1), 8, 10), c('lqpth', 'lq'), 2000, maxent.path = '.', out.dir.eval = 'vm_mod/eval')

And this is the output:

If asked, RUN as administrator

A total of 68 candidate models will be created

Starting evaluation process
bash: /home/tapirus/miniconda3/lib/libtinfo.so.6: no version information available (required by bash)
Evaluation using partial ROC, omission rates, and AICc
  |==========================================================================================================| 100%None of the significant candidate models met the AICc criterion,
delta AICc will be recalculated for significant models

Writing calibration results
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In min(ku_enm_best[, 5], na.rm = TRUE) :
  no non-missing arguments to min; returning Inf
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
4: In min(x) : no non-missing arguments to min; returning Inf
5: In max(x) : no non-missing arguments to max; returning -Inf

calibration_results.csv looks like this:

"Model","Mean_AUC_ratio","pval_pROC","Omission_rate_at_5%","AICc","delta_AICc","W_AICc","N_parameters"
"M_0.1_F_lqpth_bias_kde_xy_swd",NA,NA,NA,672.317380352645,660.179222457908,4.14263618252283e-145,217
"M_0.1_F_lqpth_bias_lat_xy_swd",NA,NA,NA,639.356435643564,627.218277748828,5.95189137881709e-138,210
...
"M_8_F_lq_bias_kde_xy_swd",NA,NA,NA,12.1381578947368,0,0.0940531801013416,6
"M_8_F_lq_bias_lat_xy_swd",NA,NA,NA,12.1381578947368,0,0.0940531801013416,6
"M_10_F_lqpth_bias_kde_xy_swd",NA,NA,NA,24.5182724252492,12.3801145305123,0.000192781684793773,12
"M_10_F_lqpth_bias_lat_xy_swd",NA,NA,NA,45.7094594594595,33.5713015647226,4.82456291758231e-09,22
"M_10_F_lq_bias_kde_xy_swd",NA,NA,NA,12.1381578947368,0,0.0940531801013416,6
"M_10_F_lq_bias_lat_xy_swd",NA,NA,NA,12.1381578947368,0,0.0940531801013416,6

Am I doing something wrong? Why does the output lack some statistics? (By the way, is it really possible that two dissimilar background samples result in identical AIC values?) These results differ so much from what I have got for similar data with kuenm_cal...

marlonecobos commented 3 years ago

Can you tell me if the data (complete set, training, and testing sets, as well as background folder) for running the analysis were prepared using the functionprepare_swd?

taprs commented 3 years ago

No, I prepared the SWD files myself (I used my own background points and test dataset which is, to my knowledge, currently impossible with prepare_swd; in addition, raster values extraction without random selection of rows goes faster for orders of magnitude). However, I tried to produce the '_joint.csv' file and dummy background data with prepare_swd to check if they had identical structure -- and yes, the column set, names and values extracted from the rasters (for occurrences) were exactly the same to what my script has produced.

marlonecobos commented 3 years ago

OK, that is the problem.

You are right, until now you cannot use your own set of background points. The problem with preparing your occurrences yourself is that one required analysis is missing. When you prepare the data with prepare_swd longitude and latitude are slightly changed so they coincide exactly with the closest background point. That is required to measure omission rates and pROC. I should do something about it in the future, but right now preparing your data with the function above is the only way to warranty the analyses will run correctly.

The other thing you can do is modify the coordinates of your occurrences yourself, so they coincide with the closest background point. I haven't done that before but it should not be that hard if you create an algorithm to measure geographic distances from each point to all your background and then select the closest background point (long and lat) for each of your records and replace them.

Sorry I cannot help more now.

Best,

taprs commented 3 years ago

Isn't simple addition of ocurrences points to background the best solution? I think I remember there was an always-ticked option in Java Maxent that sounded like that.

marlonecobos commented 3 years ago

Simple things are not always better. Adding samples to the background will be problematic because you will have duplicate information in terms of environmental conditions but not in terms of geographic coordinates. I think that introduces bias to the background, you can try, but I am not totally sure how maxent will deal with that.

taprs commented 3 years ago

The following three lines of code did the job for me. Adding samples' locations to the background influenced the selection of optimal model parameters, so I am finally disenchanted with this option.

Assuming f1 is the SWD with occurrences' locations and f2 is the SWD with background data:

nearpt <- function (coords) which.min(colSums((t(f2[,2:3]) - coords)^2))
nearest.ids <- apply(f1[,2:3], 1, nearpt)
f1[,2:3] <- f2[nearest.ids, 2:3]

I finally obtained final models with (hopefully) a reasonable parametrization. Thank you for this package!

marlonecobos commented 3 years ago

Glad you found a workaround. I am leaving this issue open so I remember to work on this part later on. Thanks for sharing your question and solution.

taprs commented 3 years ago

Hi Marlon, it's me once again.

I found sort of a justification for adding the samples points to background. Considering the Appendix 1b in Guillera-Arroita et al., 2014, I think Maxent is intended to work OK that way. (I did not look into the formulas, hope these guys know what they say.)

Another concern is that, when it comes to the continent scale (with the same 10000 background points) and/or when the grid cells are small, the slight distortion of the presence locations may have a notable effect on the model.

I tried both 'distorting' and 'adding' the presence locations and finally got better models for Eurasia with the latter approach (at least, they are closer to the known species ranges).

marlonecobos commented 3 years ago

Happy to hear that your results got better.

You are right about the number of background points and I am glad you played with that and experienced the effects on models.

I have to make significant improvements in kuenm regarding SWD format, probably this month. I will add a comment on all relevant issues to let you guys known when that happens.

jmburgos commented 3 years ago

Hi Marlon, I am having a similar issue, in my case not getting AICc values. I am also making my own SWD files, and I have included my presence locations in the background points. When I run kuenm_cal_swd, I get this:

A total of 35 candidate models will be created

Starting evaluation process
Evaluation using partial ROC, omission rates, and AICc
  |======================================================================| 100%None of the significant candidate models met the omission rate criterion,
models with the lowest omission rate and lowest AICc will be presented

Writing calibration results
Error in plot.window(...) : need finite 'xlim' values
In addition: There were 43 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In aicc(oc, mod, par_num) :
  AICc not valid: too many parameters, or likelihood = Inf... returning NA.
2: `mutate_()` was deprecated in dplyr 0.7.0.
Please use `mutate()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
3: In aicc(oc, mod, par_num) :
  AICc not valid: too many parameters, or likelihood = Inf... returning NA.
4: In aicc(oc, mod, par_num) :
  AICc not valid: too many parameters, or likelihood = Inf... returning NA
.... etc.

The calibration_results.csv file looks like this:

"Model","Mean_AUC_ratio","pval_pROC","Omission_rate_at_10%","AICc","delta_AICc","W_AICc","N_parameters"
"M_0.1_F_l_Set_1",1.89739428190759,0,0.096551724137931,NA,NA,NA,16
"M_0.1_F_lq_Set_1",1.92706060526074,0,0.113793103448276,NA,NA,NA,32
"M_0.1_F_lqp_Set_1",1.92316380061621,0,0.1,NA,NA,NA,89
"M_0.1_F_lqpt_Set_1",3.81794578017398,0,0.244827586206897,NA,NA,NA,344
"M_0.1_F_lqpth_Set_1",7.06266209131053,0,0.262068965517241,NA,NA,NA,339
...etc.

Do you have any idea what could be happening? Many thanks!

jmburgos commented 3 years ago

I tried running the model after using prepare_swd(), and everything works. So there must be something missing from my "hand made" swd files. I will dig into the prepware_swd file to understand what it is doing.

marlonecobos commented 3 years ago

Hi @jmburgos, Yes, something must be different in your data. There should not be a problem if you add your occurrences in the background. That may be an issue on my side. I am working on a major update to kuenm and I think that would solve this kind of issue. By the end of July, it should be ready. I hope you can wait.

jmburgos commented 3 years ago

Thanks Marlon, I am looking forward for the updated kuenm. I think it is important to allow users to provide their own background points, for example to account for sampling bias in the occurence data.

SDMENM commented 2 years ago

Hi Marlon, I am having a similar issue, in my case not getting AICc values. I am also making my own SWD files, and I have included my presence locations in the background points. When I run kuenm_cal_swd, I get this:

A total of 35 candidate models will be created

Starting evaluation process
Evaluation using partial ROC, omission rates, and AICc
  |======================================================================| 100%None of the significant candidate models met the omission rate criterion,
models with the lowest omission rate and lowest AICc will be presented

Writing calibration results
Error in plot.window(...) : need finite 'xlim' values
In addition: There were 43 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In aicc(oc, mod, par_num) :
  AICc not valid: too many parameters, or likelihood = Inf... returning NA.
2: `mutate_()` was deprecated in dplyr 0.7.0.
Please use `mutate()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
3: In aicc(oc, mod, par_num) :
  AICc not valid: too many parameters, or likelihood = Inf... returning NA.
4: In aicc(oc, mod, par_num) :
  AICc not valid: too many parameters, or likelihood = Inf... returning NA
.... etc.

The calibration_results.csv file looks like this:

"Model","Mean_AUC_ratio","pval_pROC","Omission_rate_at_10%","AICc","delta_AICc","W_AICc","N_parameters"
"M_0.1_F_l_Set_1",1.89739428190759,0,0.096551724137931,NA,NA,NA,16
"M_0.1_F_lq_Set_1",1.92706060526074,0,0.113793103448276,NA,NA,NA,32
"M_0.1_F_lqp_Set_1",1.92316380061621,0,0.1,NA,NA,NA,89
"M_0.1_F_lqpt_Set_1",3.81794578017398,0,0.244827586206897,NA,NA,NA,344
"M_0.1_F_lqpth_Set_1",7.06266209131053,0,0.262068965517241,NA,NA,NA,339
...etc.

Do you have any idea what could be happening? Many thanks!

Hello jmburgos, Did you find any solution for AICc problem?? I made swd background files myself. When I do not add all occurrences to background it produces AICc values, but doesn't produce omission rate, pROC and mean AUC, which I understand why.. but interestingly, when I add all my occurrences to sets of backgrounds then it will produce omission rate, pROC and mean AUC but not AICc.. Anyone has any idea ??

jmburgos commented 2 years ago

No Arif, I have not found a solution. I get the same results as you. If I add occurrences to background points I get AICc but no omission rates and the other parameters. Hopefully Marlon will have some way around this.

SDMENM commented 2 years ago

Thanks Julian, I changed the coordinates of occurrence data to match exactly like the background sets and it worked. However, in my case the independent evaluation data is on a different continent and I can't match the coordinates of that independent data with background (on a different continent). Also, if instead of matching coordinates if I add them to background then it wouldn't give me AICc. If I dont then the "kuenm_feval_swd" function won't work, because like "kuenm_mod_swd" it also requires occurrences that should have exactly same coordinates as in background. Every solution will left me with least one thing that would not be possible at the end. By the way, I know this is a simple question but it is driving me crazy.. I do args="togglelayertype=grid_code" and for some reason it still plots grid_code as continuous.. Is there anything wrong with this args code?

fbocean commented 2 years ago

Hi, just a minor note, also on the kuenm_cal_swd function.

I found this thread when I was trying to find out why in one case the function returned "incorrect" npar and aicc values for me and resultingly selected unsuitable models.

I now understand that this was due to the fact that I had run the function a second time with exactly the same settings after modifying the input data (one of my predictor variables had faulty values which I corrected for the second run). My mistake was that I used "kept=TRUE" and did not delete the models from the first run before starting the second run as I knew they would be overwritten. However, I didn't consider that the function runs the maxent batch and the R based evaluation in parallel, the latter being quicker. Since all of my model names remained the same, the evaluation process was not waiting for the new models to be created, but simply evaluated a mix of the old ones and the new ones that were already created, leading to my confusing results. This may not happen to many users, and of course, the function already gives a warning when the directories already exist, but I wonder if it would be possible to make a modification along the lines of either

Anyway, I thought I'd document it in case someone ever runs into the same issue.

@SDMENM I am not sure this will solve it for you and realise it is a while ago since you asked, however, I remember also having trouble with this before, and for me the args parameter works when I store it as a character value first, i.e. args<-"togglelayertype=grid_code" and then run the function: kuenm_cal_swd( [...] , args = args, [...])

jocelynvelazquezmaira commented 1 year ago

Hola buen día, disculpen me podría ayudar tengo este problema con el paquete kuenm me indica lo siguiente: Writing kuenmceval results... Warning messages: 1: package ‘dplyr’ was built under R version 4.2.3 2: `mutate()was deprecated in dplyr 0.7.0. ℹ Please usemutate()instead. ℹ See vignette('programming') for more help ℹ The deprecated feature was likely used in the kuenm package. Please report the issue to the authors. This warning is displayed once every 8 hours. Calllifecycle::last_lifecycle_warnings()` to see where this warning was generated.

como lo soluciono

jmburgos commented 1 year ago

Jocelyn, eso es simplemente un warning avisando que kuenm está usando una función obsoleta, mutate_(). Eso es algo que Marlon debería eventualmente corregir pero no debería afectar tu uso del paquete.

jocelynvelazquezmaira commented 1 year ago

Hola muchas gracias, por tu respuesta

Lo que sucede es que no me genera la creación de modelo final, no me sale otro error.

Mi pregunta es como le hago para corregir eso y poder seguir con mi trabajo.

El mié, 21 de jun de 2023 01:12, Julian M. Burgos @.***> escribió:

Jocelyn, eso es simplemente un warning avisando que kuenm está usando una función obsoleta, mutate_(). Eso es algo que Marlon debería eventualmente corregir pero no debería afectar tu uso del paquete.

— Reply to this email directly, view it on GitHub https://github.com/marlonecobos/kuenm/issues/17#issuecomment-1600302363, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAXEVSNODJXGLV2WT4RMWH3XMKNG5ANCNFSM436XLCBA . You are receiving this because you commented.Message ID: @.***>

jmburgos commented 1 year ago

No sabría decite, pero el problema no es el warning sino alguna otra cosa.

--

Julian Mariano Burgos, PhD Hafrannsóknastofnun, rannsókna- og ráðgjafarstofnun hafs og vatna/ Marine and Freshwater Research Institute Botnsjávarsviðs / Demersal Division Fornubúðir 5, IS-220 Hafnarfjörður, Iceland http://www.hafogvatn.is/ Sími/Telephone : +354-5752037

Netfang/Email: @.***


From: jocelynvelazquezmaira @.***> Sent: Wednesday, June 21, 2023 4:10 PM To: marlonecobos/kuenm Cc: Julian Burgos - HAFRO; Mention Subject: Re: [marlonecobos/kuenm] kuenm_cal_swd does not produce omission rate statistics (#17)

Hola muchas gracias, por tu respuesta

Lo que sucede es que no me genera la creación de modelo final, no me sale otro error.

Mi pregunta es como le hago para corregir eso y poder seguir con mi trabajo.

El mié, 21 de jun de 2023 01:12, Julian M. Burgos @.***> escribió:

Jocelyn, eso es simplemente un warning avisando que kuenm está usando una función obsoleta, mutate_(). Eso es algo que Marlon debería eventualmente corregir pero no debería afectar tu uso del paquete.

— Reply to this email directly, view it on GitHub https://github.com/marlonecobos/kuenm/issues/17#issuecomment-1600302363, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAXEVSNODJXGLV2WT4RMWH3XMKNG5ANCNFSM436XLCBA . You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/marlonecobos/kuenm/issues/17#issuecomment-1601132698, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACBRSKATVNYTITYYUXZHH6DXMMMGRANCNFSM436XLCBA. You are receiving this because you were mentioned.Message ID: @.***>