Closed phargarten2 closed 4 years ago
From: Paul M. Hargarten To: Ariane Guilbert, Université de Bordeaux Date: May 17, 2019 5:30 PM
Mr. Guilbert,
Thank you for using the miWQS package in your research and letting me know of some potential issues. First, I believe that the error message you are receiving
Error in if (max(c(tol + tt[1], tt[2] - tt[3])) <= 0) {
missing value where TRUE/FALSE needed}.
is from the solnp()
function in the Rsolnp package, which is used in estimating the weights and other WQS parameters in the training set. I suggest that you use traceback()
and contact the authors of the Rsolnp for more help: Alexios Ghalanos alexios@4dscape.com and Stefan Theussl. I can't seem to find the error in the Rsoln code upon searching the function. If the code is in an internal function of Rsolnp, use the getAnywhere()
function to access it.
Previous simulation studies[1, 2] have shown several results that might help your research:
A few more thoughts:
offset
argument of estimate.wqs()
function is on the normal scale, so please do not take a logarithm. Thank you again for using the package and best of luck to your research! I am working on many updates to the package, and I hope to release it to CRAN soon. Let me know if this helps or if you have more questions.
References:
From: Ariane Guilbert, Université de Bordeaux Date: Tue, May 21, 2019 at 4:02 AM
Dear Mr Hargarten,
Thank you very much for your very helpful answer and sorry for my late reply (I had a meeting with my manager yesterday afternoon and this morning to discuss the topic). Following your remarks:
solnp()
function in the Rsolnp package : I feel the same way knowing I have a warning message referring to this function when I use the gWQS package on my data (after categorization of the dependent variable). I will contact Alexios Ghalanos and Stefan Theussl. offset
argument of estimate.wqs()
function is on the normal scale, so please do not take a logarithm". Ariane.
Mr. Guilbert: Thank you for your prompt response and some notes.
To your second question, the small sample size, the weak association of the compounds of interest with the outcome, and a mixture of both protective and at-risk chemicals could explain the instability of the WQS index. You are on the right track by doing two different WQS analyses: (1) the protective chemicals (compounds showing negative beta) and the at-risk chemicals (compounds showing positive beta). There should be no need to combine your chemicals; from what you have told me, your correlation pattern may match those already in the literature.
Our function, estimate.wqs()
, is not designed for individual analyses; the WQS regression approach assesses a mixture of correlated components with the outcome. A reminder that the missing values in the endocrine disruptors are assumed to all be under some lower bound and are by default placed into the first quantile of the weighted sum. Any missing values in the outcome and covariates are removed before WQS analysis.
My last remark is that estimate.wqs()
can model rates using an offset in Poisson regression. All I meant was that in traditional Poisson regression with the offset, the logarithm is taken. In estimate.wqs()
, it is not. The likelihood that is maximized consists in part of:
lambdai <- offset * exp(eta)
, where eta is the linear predictor and lambdai is the mean count for the ith subject.
Dear Mr Hargarten,
When I wrote "I will also try your function for individual analysis" I was referring to the function "analyze_individually.R" you sent me by email (not "estimate.wqs()).
Regarding your last remark, my dependent variable is a test score so I am not affected by the offset term?
Best regards.
Mr. Guilbert:
Thank you for the clarification -- I follow you now.
No, by default, there is no offset. The default value, NULL, is converted to a vector of 1's. It all depends if your test score is a rate or not.
Best wishes.
@ArianeGui wrote this:
Dear Mr. Hargarten,
I contacted you some weeks ago about your miWQS package. I was encountering some issues when running the estimate.wqs
function with the Poisson argument (I was frequently getting the following error message, which aborted the calculations ...
Error in if (mac(tol + tt[1], tt[2] - tt[3])) <= 0) { missing value where TRUE/FALSE needed)}
The problem has been solved by refining the definition of my covariates (and so decreasing the number of missing values), grouping some of the exposures by summing their molar concentrations, using tertiles instead of quartiles and by running the function separately, for compounds showing positive beta and compounds showing negative beta in the uni-pollutant regressions (based on negative binomial regression).
However, I am still a bit troubled by one point: when I run the estimate.wqs function with the compounds showing a negative beta in the uni-pollutant regressions and so using the b1.pos = FALSE
argument, I still very frequently get a wqs beta index positive. Is it normal?
Thank you very much in advance for your reply.
Best regards.
Ariane Guilbert.
@ArianeGui :
Thank you for faithfully using the miWQS package.
When setting b1.pos = FALSE
, did you find a positive WQS beta index in the training set or in the validation set? The beta1 estimate in the training dataset, by looking at element, train.estimates
, of an estimate.wqs()
object should all be below 0.
Also, have you heard back about the Rsolnp issue or did you no longer need to since you avoided the error by taking several steps as described above?
Thank you, Paul Hargarten
When setting b1.pos = FALSE, did you find a positive WQS beta index in the training set or in the validation set?
I mean the final WQS beta index you get after running the estimate.wqs() function.
Also, have you heard back about the Rsolnp issue or did you no longer need to since you avoided the error by taking several steps as described above?
I sent an email to the author of the function but did not get a feedback. I should contact him again.
Thank you for your reply.
Ariane Guilbert.
Sorry for the delay. If your beta1 log WQS estimates for protective estimates are really high, I think that you are right to question the results. But...the high estimate may be due to small sample size or a big variability of log WQS estimates (is your "positive" effect within 1 standard error?). If there is truly no mixture effect, the log WQS estimate may be positive by chance, so small positive effects are not alarming.
A proxy simulation study using a binary response shows a protective effect that one would expect. The main results are below. The "est.odds" are the Odds Ratios estimated from WQS regression; the "true.odds" are the true odd ratios simulated. The covariates (Z.sim
) and the chemicals (X
) came from the example dataset in the WQS package, simdata87
. Using this data, the outcome had to be re-simulated to show a protective effect.
est.odds | truth.odds | |
---|---|---|
(Intercept) | 0.482 | 0.287 |
Age | 0.960 | 1.033 |
Female | 0.818 | 0.972 |
Hispanic | 1.571 | 1.716 |
Non.Hispanic_Others | 0.905 | 1.127 |
WQS | 0.699 | 0.750 |
Function calls and main results using print()
.
> fit.protective <- estimate.wqs(
+ y = y, X = X.true, Z = Z.sim,
+ proportion.train = 0.5, n.quantiles = 4,
+ place.bdls.in.Q1 = FALSE,
+ B = 100,
+ b1.pos = FALSE, #*********
+ signal.fn = "signal.converge.only",
+ family = "binomial",
+ verbose = FALSE
+ )
#> No missing values in matrix detected. Regular quantiles computed.
> fit.protective
Odd Ratios & 95% CI (N.valid = 500)
Odds Ratio SE.OR 95% CI P-value
(Intercept) 0.482 1.48 0.482 (0.223, 1.041) 0.063
Age 0.960 1.07 0.960 (0.842, 1.094) 0.538
Female 0.818 1.26 0.818 (0.518, 1.291) 0.388
Hispanic 1.570 1.26 1.571 (0.996, 2.476) 0.052
Non.Hispanic_Others 0.905 1.32 0.905 (0.527, 1.554) 0.718
WQS 0.699 1.18 0.699 (0.503, 0.971) 0.033
AIC: 511.0522
All (100) bootstraps have converged.
Weights Adjusted by signal.converge.only using N.train = 500 observations:
pentachlorophenol pcb_118 alpha.chlordane ddt methoxychlor pcb_170
0.3561 0.1527 0.1126 0.0715 0.0678 0.0657
pcb_105 dieldrin pcb_180 pcb_138 dde pcb_153
0.0478 0.0295 0.0295 0.0247 0.0203 0.0094
lindane gamma.chlordane
0.0083 0.0040
Important chemicals defined as mean weights > 1/14~0.071.
Thank you very much for your detailed reply. It is my turn to apologize for the delayed answer. I took me time to make some complementary analyses. I confirm that my positive effects are small when I set Beta1 to be protective.
I have another (and I hope last :-s ) question for you. I have some missing data in my exposure variables (the real values could be below or above the LOD). If I understand well the estimate.wqs function and the default parameter "place.bdls.in.Q1 = if (anyNA(X)) {TRUE } else {FALSE }", these missing values are automatically placed in the first quantile of the weighted sum. Instead, how can I exclude these missing values from the calculations (without excluding the whole rows that may contain data for other pollutants). I tried : "place.bdls.in.Q1 = if (anyNA(X)) {FALSE } else {FALSE }" but I get exactly the same result as with the default statement. Thank you very much in advance for your reply. Best regards.
@ArianeGui
The WQS procedure does not handle missing data. If you set place.bdls.in.Q1 = TRUE, the missing values are automatically placed in the first quantile of the weighted sum, as you have stated. Setting it to FALSE implies that the data is complete; you need to remove all individuals with missing pollutants before running estimate.wqs()
.
Additionally, I am submitting a new version to CRAN that has included new functions and fixed some bugs found in the package since its debut late last year.
Thank you for your answer regarding the missing data. I will look at the new version of the package very soon.
Have a nice summer.
From: Ariane Guilbert, Université de Bordeaux Date: May 10, 2019, 11:40 AM
Dear Mr. Hargarten,
I use your miWQS package and its
estimate.wqs()
function to study the relationships between several endocrine disruptors and child neurodevelopment (test score). Based on the distribution of my dependent variable, I apply the family option “Poisson” (as an approximation of the negative binomial distribution). However, I face some problems. If I run several times theestimate.wqs()
function:I get wqs index values very unstable, sometimes highly significant, sometimes not (I did not set a seed to check the stability of the results);
I frequently get the following error message which aborts the calculations
What might be the reasons for such issues: the approximation of the negative binomial distribution of the outcome by a Poisson distribution? Exposure variables weakly related to the outcome? Exposure variables having opposite effects on the outcome (while beta1 is expected to be either positive or negative)?
I thank you very much for your answer and for having developed this tool.