strengejacke / ggeffects

Estimated Marginal Means and Marginal Effects from Regression Models for ggplot2
https://strengejacke.github.io/ggeffects
Other
556 stars 35 forks source link

Predictions extracted with predict_response are not consistent when changing the number of extracts #601

Closed VickTaxiarchi closed 17 minutes ago

VickTaxiarchi commented 4 hours ago

Hi,

I run a simple model:

fit <- svyglm(outc ~ ns(year, df = 4) + cov1 + cov2, family=binomial(link="logit"), design=design_em)

with year ranging from 2010 to 2020.

My aim is to extract 1000 marginal predictions of the outcome within the range year. Ie marginal prediction of outcome when year = 2010, year = 2010.013, year=2010.26, etc. That is to create a plot of a smoother line of the trend of the outcome.

To do that:

  1. I created a small loop to predict each of these values and save them in a data.frame. So I had 1000 predictions of outcome over time.

    year_seq <- seq(from = 2010, to = 2020, length.out = 1000)

foreach(value = year_seq) %do% {

Generate the prediction for the current year

prediction <- predict_response(fit , terms = c(paste0("year [", value, "]")))

Assign the prediction to a dynamically named variable

assign(paste0("pred", value), prediction, envir = .GlobalEnv)

Add the current prediction and year to the data frame

predictions_df <- rbind(predictions_df, data.frame(Year = value, Prediction = prediction)) }

  1. To validate the results, I also extracted the predictions of the exact years 2010, 2011, 2012, ... , 2020, as is the original variable in the data.

pred2020<-predict_response(fit, terms=c("year [2010:2020]"))

But the results differed.

  1. Also also extracted up to 2019:

pred2019<-predict_response(fit, terms=c("year [2010:2019]"))

and these again differed to the data frame pred2020.

Therefore my questions are: (i) why the predictions differ (ii) how can I now which are the correct ones (iii) how can I extract the predictions in 1) correctly, since that is my main aim.

Many thanks Vicky

strengejacke commented 3 hours ago

Do you have a reproducible example? It's hard to find out where potential bug are without some more details and a toy example to investigate further.

VickTaxiarchi commented 2 hours ago

Fair enough. Please see attached test data and my code. I simplified it a bit and instead of length.out=1000 changed it to 20 for faster, but essentially should be the same.

Thank you in advance.

Best wishes Vicky

Vicky (Paraskevi) Taxiarchi Research Fellow Centre for Women’s Mental Health | Division of Psychology and Mental Health | Faculty of Biology, Medicine and Health | University of Manchester | Manchester | UK

From: Daniel @.> Sent: 08 November 2024 11:18 To: strengejacke/ggeffects @.> Cc: Vicky Taxiarchi @.>; Author @.> Subject: Re: [strengejacke/ggeffects] Predictions extracted with predict_response are not consistent when changing the number of extracts (Issue #601)

Do you have a reproducible example? It's hard to find out where potential bug are without some more details and a toy example to investigate further.

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https:/github.com/strengejacke/ggeffects/issues/601*issuecomment-2464446786__;Iw!!PDiH4ENfjr2_Jw!F3Ie-529vdNUevVmfCOJg-X-KPI1b2wrhcWxboMRnWr4yqZSnS7f7XRpwxOekZYneUFEY-B3t_RvtZ1aDeaGWPqF53PhoX3yhF_Ijtc$, or unsubscribe [github.com]https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AZXMCFYXH2EZG4EX7ZI4LDTZ7SMVBAVCNFSM6AAAAABRNE4EU6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRUGQ2DMNZYGY__;!!PDiH4ENfjr2_Jw!F3Ie-529vdNUevVmfCOJg-X-KPI1b2wrhcWxboMRnWr4yqZSnS7f7XRpwxOekZYneUFEY-B3t_RvtZ1aDeaGWPqF53PhoX3yaCJ8SSw$. You are receiving this because you authored the thread.Message ID: @.**@.>>

strengejacke commented 1 hour ago

I think if you sent attachments via mail as reply to GitHub comments, it's not attached on the GitHub site :-)

Can you attach your files directly via the website?

VickTaxiarchi commented 1 hour ago

No prob. Code in the word file. my_data.csv Code for GitHub.docx

strengejacke commented 56 minutes ago

I see what you would like to achieve, more values for a smoother plot. There must be something wrong in your for-loop, but you can avoid this and add the sequence directly:

library(survey)
library(splines)
library(ggeffects)

my_data <- datawizard::data_read("~/../Downloads/my_data.csv")

design <- svydesign(ids = ~1, data = my_data, weights = ~weights)

fit <- svyglm(
  outc ~ ns(year, df = 4) + cov1 + cov2 + cov3, 
  family = binomial(), 
  design = design
)
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!

pred_df <- predict_response(fit, terms = "year [2010:2020 by=0.1]")
plot(pred_df)

or you create a vector and add that one to the terms argument:

values <- seq(2010, 2020, by = 0.1)
pred_df <- predict_response(fit, terms = "year [values]")
plot(pred_df)

strengejacke commented 55 minutes ago

To get proper axis breaks/labels, simply add a ggplot-layer:

plot(pred_df) + ggplot2::scale_x_continuous(breaks = seq(2010, 2020, 2))

VickTaxiarchi commented 33 minutes ago

Oh gee thank you. I was trying to find if there was such an option, could see it anywhere. Glad it was so simple.

Many thanks

Best wishes Vicky

Vicky (Paraskevi) Taxiarchi Research Fellow Centre for Women’s Mental Health | Division of Psychology and Mental Health | Faculty of Biology, Medicine and Health | University of Manchester | Manchester | UK

From: Daniel @.> Sent: 08 November 2024 13:56 To: strengejacke/ggeffects @.> Cc: Vicky Taxiarchi @.>; Author @.> Subject: Re: [strengejacke/ggeffects] Predictions extracted with predict_response are not consistent when changing the number of extracts (Issue #601)

I see what you would like to achieve, more values for a smoother plot. There must be something wrong in your for-loop, but you can avoid this and add the sequence directly:

library(survey)

library(splines)

library(ggeffects)

my_data <- datawizard::data_read("~/../Downloads/my_data.csv")

design <- svydesign(ids = ~1, data = my_data, weights = ~weights)

fit <- svyglm(

outc ~ ns(year, df = 4) + cov1 + cov2 + cov3,

family = binomial(),

design = design

)

> Warning in eval(family$initialize): non-integer #successes in a binomial glm!

pred_df <- predict_response(fit, terms = "year [2010:2020 by=0.1]")

plot(pred_df)

[Image removed by sender.][camo.githubusercontent.com]https://urldefense.com/v3/__https:/camo.githubusercontent.com/12e0f6fd87634c598736162afd14f15207b19a46f7d3203dc36daa34925b83fd/68747470733a2f2f692e696d6775722e636f6d2f474f644a5647652e706e67__;!!PDiH4ENfjr2_Jw!Ale4UQFnS2_JsyIDPbZip0nu1rcImKZSZPsdvL3YU_7m_mS5YiC7vv6UrjDEqvTLYbZgPC_9wlksmJkqsakwxod3Rl07Y2dPmKObay4$

or you create a vector and add that one to the terms argument:

values <- seq(2010, 2020, by = 0.1)

pred_df <- predict_response(fit, terms = "year [values]")

plot(pred_df)

[Image removed by sender.][camo.githubusercontent.com]https://urldefense.com/v3/__https:/camo.githubusercontent.com/4efcc5da1a362ca4f05c421539c5353913a76fea44455ab73c38e06777e3487f/68747470733a2f2f692e696d6775722e636f6d2f376d534448336d2e706e67__;!!PDiH4ENfjr2_Jw!Ale4UQFnS2_JsyIDPbZip0nu1rcImKZSZPsdvL3YU_7m_mS5YiC7vv6UrjDEqvTLYbZgPC_9wlksmJkqsakwxod3Rl07Y2dPtshg40I$

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https:/github.com/strengejacke/ggeffects/issues/601*issuecomment-2464827986__;Iw!!PDiH4ENfjr2_Jw!Ale4UQFnS2_JsyIDPbZip0nu1rcImKZSZPsdvL3YU_7m_mS5YiC7vv6UrjDEqvTLYbZgPC_9wlksmJkqsakwxod3Rl07Y2dPO8mzLEc$, or unsubscribe [github.com]https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AZXMCF6D5AVT4LRLDZ575BLZ7S7HBAVCNFSM6AAAAABRNE4EU6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRUHAZDOOJYGY__;!!PDiH4ENfjr2_Jw!Ale4UQFnS2_JsyIDPbZip0nu1rcImKZSZPsdvL3YU_7m_mS5YiC7vv6UrjDEqvTLYbZgPC_9wlksmJkqsakwxod3Rl07Y2dP_kourH0$. You are receiving this because you authored the thread.Message ID: @.***>

strengejacke commented 17 minutes ago

Great, good to see this solves your request!