jslefche / piecewiseSEM

Piecewise Structural Equation Modeling in R
151 stars 48 forks source link

model inplementation Fisher's C = NA #291

Open hugoeira opened 9 months ago

hugoeira commented 9 months ago

Hi,

Sorry to bother with might be a basic question.

I am implementing a SEM model, which seems straightforward to me. Nevertheless the results are bit weird.

model_ha <- psem(
  lmer( bci ~ shannon + cort + ha + (1|ID),  data = metadata),
  lmer(ha ~ shannon + cort + ( 1|ID),  data = metadata),
  lmer(cort ~ shannon + (1|ID), data = metadata),
  data= metadata)

I have two repeated measure and thus controlling for individual ID. I have 43 individuals sampled twice (I am guessing the error that I am encountering is from an overfitted model).

summary(model_ha)

First error:

boundary (singular) fit: see help('isSingular')
Warning message:
NAs detected in the dataset. Consider removing all rows with NAs to prevent fitting to different subsets of data

This I understand: there is almost no variance coming from the random effects. I have also some missing data in 3 or 4 samples, which I think it should be a major problem. Is there a way to tell piecewise SEM how to deal with missing data?

Second error:

Call:
  bci ~ shannon + cort + ha
  ha ~ shannon + cort
  cort ~ shannon

    AIC
 648.655

---
Tests of directed separation:

 No independence claims present. Tests of directed separation not possible.

--
Global goodness-of-fit:

Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom

The fact that there are no indepence claims (I am guessing) it's because I fitted all the possible paths to the model, right?

What I really do not understand is why Fisher's C cannot be calculated??? Is it model overfit? Low sample size? Or is it something with the model description itself?

Thanks in advance. Kind regards, Hugo

jslefche commented 9 months ago

Hello,

Very good questions--

Yes, the boundary error has to do with the small variance components of the random effects. You could respecify or remove

NA's can be problematic if you end up fitting regressions to vastly different subsets of the model. However, this is just a warning and as you see, provides output. If the missing data are minimal (as in your case) you may feel comfortable to ignore, or impute using some other method (eg, random forests)

You are correct that you cannot recover a Fisher's C or Chi-squared when the model contains all paths (is "fully saturated"). In this case, you could respecify the model structure to remove paths or rely on other, model-specific indicators of fit, such as R^2s

HTH,

Jon

Jonathan Lefcheck, Ph.D.

Research Scientist

Integration and Application Network

University of Maryland Center for Environmental Science

www.jonlefcheck.nethttp://www.jonlefcheck.net


From: Hugo Eira @.> Sent: Tuesday, September 26, 2023 9:20 AM To: jslefche/piecewiseSEM @.> Cc: Subscribed @.***> Subject: [jslefche/piecewiseSEM] model inplementation Fisher's C = NA (Issue #291)

Hi,

Sorry to bother with might be a basic question.

I am implementing a SEM model, which seems straightforward to me. Nevertheless the results are bit weird.

model_ha <- psem( lmer( bci ~ shannon + cort + ha + (1|ID), data = metadata), lmer(ha ~ shannon + cort + ( 1|ID), data = metadata), lmer(cort ~ shannon + (1|ID), data = metadata), data= metadata)

I have two repeated measure and thus controlling for individual ID. I have 43 individuals sampled twice (I am guessing the error that I am encountering is from an overfitted model).

summary(model_ha)

First error:

boundary (singular) fit: see help('isSingular') Warning message: NAs detected in the dataset. Consider removing all rows with NAs to prevent fitting to different subsets of data

This I understand: there is almost no variance coming from the random effects. I have also some missing data in 3 or 4 samples, which I think it should be a major problem. Is there a way to tell piecewise SEM how to deal with missing data?

Second error:

Call: bci ~ shannon + cort + ha ha ~ shannon + cort cort ~ shannon

AIC

648.655


Tests of directed separation:

No independence claims present. Tests of directed separation not possible.

-- Global goodness-of-fit:

Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom Fisher's C = NA with P-value = NA and on 0 degrees of freedom

The fact that there are no indepence claims (I am guessing) it's because I fitted all the possible paths to the model, right?

What I really do not understand is why Fisher's C cannot be calculated??? Is it model overfit? Low sample size? Or is it something with the model description itself?

Thanks in advance. Kind regards, Hugo

— Reply to this email directly, view it on GitHubhttps://www.google.com/url?q=https://github.com/jslefche/piecewiseSEM/issues/291&source=gmail-imap&ust=1696339206000000&usg=AOvVaw2x9LX0fuZYq8ZbjMgXwFWY, or unsubscribehttps://www.google.com/url?q=https://github.com/notifications/unsubscribe-auth/AAR4AV4QY3PKVFMGZLKCEMLX4LJALANCNFSM6AAAAAA5HW4C4A&source=gmail-imap&ust=1696339206000000&usg=AOvVaw0AnOPgQXgKbWJMm87YpaAs. You are receiving this because you are subscribed to this thread.Message ID: @.***>

hugoeira commented 9 months ago

Hi, Thanks so much for the quick response.

"If the missing data are minimal (as in your case) you may feel comfortable to ignore, or impute using some other method (eg, random forests)"

Can this be done in piecewiseSEM or do I need to do it before inputting the data?

"You are correct that you cannot recover a Fisher's C or Chi-squared when the model contains all paths (is "fully saturated"). In this case, you could respecify the model structure to remove paths or rely on other, model-specific indicators of fit, such as R^2s"

I did remove some of the paths just to test it out, but the results are the same:

Structural Equation Model of model_ha 

Call:
  bci ~ shannon + cort + ha
  ha ~ shannon + cort

    AIC
 402.976

---
Tests of directed separation:

 No independence claims present. Tests of directed separation not possible.

--
Global goodness-of-fit:

Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom

Individual R-squared:

     Response method Marginal Conditional
  std_bci_two   none     0.18        0.42
       std_ha   none     0.38        0.38

Any ideas why?

Cheers, Hugo

jslefche commented 9 months ago

Hi Hugo,

piecewiseSEM does not perform the imputation but there are other packages that can assist (eg, rfImpute)

Hmm, not sure why after removing paths you do not get a goodness-of-fit statistic, but if you share your code I can look

Cheers,

Jon

Jonathan Lefcheck, Ph.D.

Research Scientist

Integration and Application Network

University of Maryland Center for Environmental Science

www.jonlefcheck.nethttp://www.jonlefcheck.net


From: Hugo Eira @.> Sent: Tuesday, September 26, 2023 9:37 AM To: jslefche/piecewiseSEM @.> Cc: Jon Lefcheck @.>; Comment @.> Subject: Re: [jslefche/piecewiseSEM] model inplementation Fisher's C = NA (Issue #291)

Hi, Thanks so much for the quick response.

"If the missing data are minimal (as in your case) you may feel comfortable to ignore, or impute using some other method (eg, random forests)"

Can this be done in piecewiseSEM or do I need to do it before inputting the data?

"You are correct that you cannot recover a Fisher's C or Chi-squared when the model contains all paths (is "fully saturated"). In this case, you could respecify the model structure to remove paths or rely on other, model-specific indicators of fit, such as R^2s"

I did remove some of the paths just to test it out, but the results are the same:

Structural Equation Model of model_ha

Call: bci ~ shannon + cort + ha ha ~ shannon + cort

AIC

402.976


Tests of directed separation:

No independence claims present. Tests of directed separation not possible.

-- Global goodness-of-fit:

Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom Fisher's C = NA with P-value = NA and on 0 degrees of freedom

Individual R-squared:

 Response method Marginal Conditional

std_bci_two none 0.18 0.42 std_ha none 0.38 0.38

Any ideas why?

Cheers, Hugo

— Reply to this email directly, view it on GitHubhttps://www.google.com/url?q=https://github.com/jslefche/piecewiseSEM/issues/291%23issuecomment-1735560985&source=gmail-imap&ust=1696340252000000&usg=AOvVaw2mFx5r_P03aFbOfkdA48Ek, or unsubscribehttps://www.google.com/url?q=https://github.com/notifications/unsubscribe-auth/AAR4AV2Y3B3363Q36JSEP3LX4LLBVANCNFSM6AAAAAA5HW4C4A&source=gmail-imap&ust=1696340252000000&usg=AOvVaw0o8Imb-ZHyZpjrLZIGWIae. You are receiving this because you commented.Message ID: @.***>

hugoeira commented 9 months ago

Hi,

Here it goes.

Thanks. Really appreciate it

sem_hugo.zip

hugoeira commented 9 months ago

Hi Jonathan,

Maybe this can be a way to work around it.

Since piecewiseSem does not compute Fisher's C for saturated models I removed one of the paths:

model_ha <- psem(
  lmer( bci ~ shannon + cort + ha + (1|ID),  data = metadata),
  lmer(ha ~  cort + ( 1|ID),  data = metadata),
  lmer(shannon ~ ha  + (1|ID), data = metadata),
  data= metadata)

summary(model_ha)
Structural Equation Model of model_ha 

Call:
  bci ~ shannon + cort + std_ha
  ha ~ cort
  shannon ~ ha

    AIC
 583.891

---
Tests of directed separation:

                    Independ.Claim Test.Type      DF Crit.Value P.Value 
              shannon  ~ cort + ...      coef 74.6271     0.0225  0.8813 

--
Global goodness-of-fit:

Chi-Squared = 2.137 with P-value = 0.144 and on 1 degrees of freedom
Fisher's C = 0.253 with P-value = 0.881 and on 2 degrees of freedom

Now the significance of the missing path is calculated in the test of directed separation. And Fisher'C can be calculated as well.

Is this an option? Then for visualization I can just add the missing path to the diagramm?

Also found this cool R package "semEff" that allows to bootstrap the calculation of effects for structural equation models:

 system.time(
  model_ha_boot <- bootEff(model_ha, R = 1000, seed = 123, ran.eff = "ID",parallel = "multicore"))

model_ha_eff <- semEff(model_ha_boot)

summary(model_ha_eff)
SEM direct, summed indirect, total, and mediator effects:

bci (1/3):

                             Effect     Bias   Std. Err.   Lower CI Upper CI    
                             ------   ------   ---------   -------- --------    
 DIRECT       cort        | -0.375 |  0.049 |     0.091 |   -0.578   -0.251 | *
              ha          | -0.168 |  0.039 |     0.099 |   -0.368   -0.004 | *
                  shannon |  0.189 |  0.009 |     0.089 |    0.011    0.354 | *

 INDIRECT     cort      |  0.120 | -0.014 |     0.073 |   -0.002    0.294 |  
           cort         | -0.025 |  0.000 |     0.035 |   -0.107    0.038 |  

 TOTAL        cort       | -0.255 |  0.035 |     0.068 |   -0.478   -0.169 | *
             ha          | -0.193 |  0.038 |     0.105 |   -0.451   -0.021 | *
                 shannon |  0.189 |  0.009 |     0.089 |    0.011    0.354 | *

 MEDIATORS    ha          |  0.120 | -0.014 |     0.073 |   -0.002    0.294 |  
                  shannon | -0.009 |  0.001 |     0.011 |   -0.042    0.006 |  

ha (2/3):

                      Effect     Bias   Std. Err.   Lower CI Upper CI    
                      ------   ------   ---------   -------- --------    
 DIRECT       cort | -0.619 | -0.065 |     0.092 |   -0.726   -0.358 | *

 INDIRECT n/a      |      - |      - |         - |        -        - |  

 TOTAL       cort | -0.619 | -0.065 |     0.092 |   -0.726   -0.358 | *

 MEDIATORS  n/a  |      - |      - |         - |        -        - |  

shannon.entropy (3/3):

                      Effect     Bias   Std. Err.   Lower CI Upper CI    
                      ------   ------   ---------   -------- --------    
 DIRECT           ha   | -0.131 | -0.003 |     0.158 |   -0.402    0.238 |  

 INDIRECT         cort |  0.081 |  0.009 |     0.109 |   -0.189    0.258 |  

 TOTAL            cort |  0.081 |  0.009 |     0.109 |   -0.189    0.258 |  
                   ha   | -0.131 | -0.003 |     0.158 |   -0.402    0.238 |  

 MEDIATORS       ha   |  0.081 |  0.009 |     0.109 |   -0.189    0.258 |  

This also allows me to estimate the effect of the missing path (in bold). shannon (3/3): TOTAL cort | 0.081 | 0.009 | 0.109 | -0.189 0.258 |

I can also do a different model with a different missing path to double check the results (which are the same).

Is this a valid approach?

Cheers, Hugo

jslefche commented 9 months ago

Hi Hugo, removing the path will free up information to calculate Fisher's C (and Chi-squared) but this reflects the goodness-of-fit of the reduced model. It would not be appropriate to assign that value to a graph with that path included. If the path you removed is theoretically not relevant to your questions, you can consider keeping that path removed (including in the path diagram) and report the model output below

HTH,

Jon

Jonathan Lefcheck, Ph.D.

Research Scientist

Integration and Application Network

University of Maryland Center for Environmental Science

www.jonlefcheck.nethttp://www.jonlefcheck.net


From: Hugo Eira @.> Sent: Wednesday, September 27, 2023 7:16 AM To: jslefche/piecewiseSEM @.> Cc: Jon Lefcheck @.>; Comment @.> Subject: Re: [jslefche/piecewiseSEM] model inplementation Fisher's C = NA (Issue #291)

Hi Jonathan,

Maybe this can be a way to work around it.

Since piecewiseSem does not compute Fisher's C for saturated models I removed one of the paths:

model_ha <- psem( lmer( bci ~ shannon + cort + ha + (1|ID), data = metadata), lmer(ha ~ cort + ( 1|ID), data = metadata), lmer(shannon ~ ha + (1|ID), data = metadata), data= metadata)

summary(model_ha)

Structural Equation Model of model_ha

Call: bci ~ shannon + cort + std_ha ha ~ cort shannon ~ ha

AIC

583.891


Tests of directed separation:

                Independ.Claim Test.Type      DF Crit.Value P.Value
          shannon  ~ cort + ...      coef 74.6271     0.0225  0.8813

-- Global goodness-of-fit:

Chi-Squared = 2.137 with P-value = 0.144 and on 1 degrees of freedom Fisher's C = 0.253 with P-value = 0.881 and on 2 degrees of freedom

Now the significance of the missing path is calculated in the test of directed separation. And Fisher'C can be calculated as well.

Is this an option? Then for visualization I can just add the missing path to the diagramm?

Also found this cool R package "semEff" that allows to bootstrap the calculation of effects for structural equation models:

system.time( model_ha_boot <- bootEff(model_ha, R = 1000, seed = 123, ran.eff = "ID",parallel = "multicore"))

model_ha_eff <- semEff(model_ha_boot)

summary(model_ha_eff)

SEM direct, summed indirect, total, and mediator effects:

bci (1/3):

                         Effect     Bias   Std. Err.   Lower CI Upper CI
                         ------   ------   ---------   -------- --------

DIRECT cort | -0.375 | 0.049 | 0.091 | -0.578 -0.251 | ha | -0.168 | 0.039 | 0.099 | -0.368 -0.004 | shannon | 0.189 | 0.009 | 0.089 | 0.011 0.354 | *

INDIRECT cort | 0.120 | -0.014 | 0.073 | -0.002 0.294 | cort | -0.025 | 0.000 | 0.035 | -0.107 0.038 |

TOTAL cort | -0.255 | 0.035 | 0.068 | -0.478 -0.169 | ha | -0.193 | 0.038 | 0.105 | -0.451 -0.021 | shannon | 0.189 | 0.009 | 0.089 | 0.011 0.354 | *

MEDIATORS ha | 0.120 | -0.014 | 0.073 | -0.002 0.294 | shannon | -0.009 | 0.001 | 0.011 | -0.042 0.006 |

ha (2/3):

                  Effect     Bias   Std. Err.   Lower CI Upper CI
                  ------   ------   ---------   -------- --------

DIRECT cort | -0.619 | -0.065 | 0.092 | -0.726 -0.358 | *

INDIRECT n/a | - | - | - | - - |

TOTAL cort | -0.619 | -0.065 | 0.092 | -0.726 -0.358 | *

MEDIATORS n/a | - | - | - | - - |

                  Effect     Bias   Std. Err.   Lower CI Upper CI
                  ------   ------   ---------   -------- --------

DIRECT ha | -0.131 | -0.003 | 0.158 | -0.402 0.238 |

INDIRECT cort | 0.081 | 0.009 | 0.109 | -0.189 0.258 |

TOTAL cort | 0.081 | 0.009 | 0.109 | -0.189 0.258 | ha | -0.131 | -0.003 | 0.158 | -0.402 0.238 |

MEDIATORS ha | 0.081 | 0.009 | 0.109 | -0.189 0.258 |

This also allows me to estimate the effect of the missing path (in bold). shannon (3/3): TOTAL cort | 0.081 | 0.009 | 0.109 | -0.189 0.258 |

I can also do a different model with a different missing path to double check the results (which are the same).

Is this a valid approach?

Cheers, Hugo

— Reply to this email directly, view it on GitHubhttps://www.google.com/url?q=https://github.com/jslefche/piecewiseSEM/issues/291%23issuecomment-1737195191&source=gmail-imap&ust=1696418173000000&usg=AOvVaw2M3tQ8V87DKs6dXeZhz4ZD, or unsubscribehttps://www.google.com/url?q=https://github.com/notifications/unsubscribe-auth/AAR4AV6THT7KB2IHN5DPO43X4QDHXANCNFSM6AAAAAA5HW4C4A&source=gmail-imap&ust=1696418173000000&usg=AOvVaw0lKQahv1STOeZ16o5StuYU. You are receiving this because you commented.Message ID: @.***>

hugoeira commented 9 months ago

Hi Jonathan, Thanks so much for guiding me through this, I have been stuck on this for a couple of months now :(

Indeed the fitness measures reflect the reduced model, wasn't thinking about this.

The most relevant model is the saturated model and I can't find a proper reason to exclude any of the paths. Can I present the results without a fitness measure?

"or rely on other, model-specific indicators of fit, such as R^2s" You mention this in one of your previous answers. Are you talking about computing confidence intervals for each of the R^2s and present those as support for the model?

Really sorry about the extensive questions.

Kind regards, Hugo

jslefche commented 9 months ago

Hi Hugo, its perfectly fine to present the saturated model. In which case, you can acknowledge that you have no degrees of freedom leftover with which to calculate your goodness-of-fit indices

In that case, yes, I would rely on the strength and significance of individual pathways (examine the standard errors and P-values) and the variance explained (R^2) to build a qualitative argument for why the entirety of the path diagram adequately captures correlations indicated in the data.

HTH,

Jon

Jonathan Lefcheck, Ph.D.

Research Scientist

Integration and Application Network

University of Maryland Center for Environmental Science

www.jonlefcheck.nethttp://www.jonlefcheck.net


From: Hugo Eira @.> Sent: Thursday, September 28, 2023 1:55 AM To: jslefche/piecewiseSEM @.> Cc: Jon Lefcheck @.>; Comment @.> Subject: Re: [jslefche/piecewiseSEM] model inplementation Fisher's C = NA (Issue #291)

Hi Jonathan, Thanks so much for guiding me through this, I have been stuck on this for a couple of months now :(

Indeed the fitness measures reflect the reduced model, wasn't thinking about this.

The most relevant model is the saturated model and I can't find a proper reason to exclude any of the paths. Can I present the results without a fitness measure?

"or rely on other, model-specific indicators of fit, such as R^2s" You mention this in one of your previous answers. Are you talking about computing confidence intervals for each of the R^2s and present those as support for the model?

Really sorry about the extensive questions.

Kind regards, Hugo

— Reply to this email directly, view it on GitHubhttps://www.google.com/url?q=https://github.com/jslefche/piecewiseSEM/issues/291%23issuecomment-1738517376&source=gmail-imap&ust=1696485334000000&usg=AOvVaw257ngLmA_Wgp-RMTc5MWFR, or unsubscribehttps://www.google.com/url?q=https://github.com/notifications/unsubscribe-auth/AAR4AV4TGHZG2ZEJG2LDOH3X4UGNJANCNFSM6AAAAAA5HW4C4A&source=gmail-imap&ust=1696485334000000&usg=AOvVaw0BSQG1POT9P9elYeog0f5j. You are receiving this because you commented.Message ID: @.***>

hugoeira commented 9 months ago

Hi Jonathan,

Thanks so much for such a detailed feedback, this was really helpful.

I have been struggling to figure out how/if I can support my results.

By the way I found this pre-print that kind tests and summarizes different scenarios of structural equation modeling and how to analyse model fitness when fit measures might not be computed/informative.

This diagram sums it up: image

Link for the pre print: https://arxiv.org/abs/1803.06186

Thanks so much again.

Kind regards, Hugo