phargarten2 / miWQS

Handles the uncertainty due to below the detection limit in a correlated component mixture problem.
GNU General Public License v3.0
2 stars 0 forks source link

Handling multiple LODs per chemical in miWQS #4

Open phargarten2 opened 4 months ago

phargarten2 commented 4 months ago

From: Janice Sent: Tuesday, April 16, 2024 11:11

Hi Dr. Hargarten,

Hope you’re doing well. I’m Janice ... working on a project examining the relationship between exposure to chemicals and disease. I am looking for a method to handle observations that fall below the limit of detection. Below is an example of what my dataset looks like. “<” indicates that the concentration is below LOD and the number after < represents the LOD value.

unnamed

I believe the reason why there are multiple LODs in my chemicals is because they were analyzed in batches even though the same method was used (LC-MS). I have up to 5 LODs for some of my chemicals 😅. Also, the lab doesn’t provide batch numbers, so I don’t have enough information to separate them by batch and impute them separately.

I know of one R package that can handle multiple LODs – lodi package clmi function -- but it imputes one chemical only and it’s meant for use with single pollutant models not mixtures. Google led me to miWQS and I have to say that I like your package! I went through the tutorial document and found it easy to follow and the method straightforward to implement. It’s almost perfect for my project 😊…

I’m wondering whether miWQS can accommodate multiple LODs for each chemical?

phargarten2 commented 4 months ago

From: Paul Hargarten Sent: Wednesday, May 1, 2024 11:30 AM

Janice,

Thank you for considering the miWQS package in analyzing your data and your kind words. Your dataset is very interesting in that the detection limit is dependent on the batch. I built the package with the mindset that one chemical has one detection limit…thinking that this would reflect the real world. I didn’t consider your scenario when formulating the package, so it can’t handle multiple LODs in the chemicals. I was assuming that the detection limit is a fixed constant for each chemical.

If you are considering using the Lubin method for alpha-chlordane, you will get the following using one detection limit.

set.seed(202)
results_Lubin <- impute.Lubin(chemcol = simdata87$X.bdl[, 1], dlcol = simdata87$DL[1],  K = 5, verbose = TRUE)  #  

If somehow alpha-chlordane had two detection limits, the code would be this:

set.seed(202)
results_Lubin <- impute.Lubin(chemcol = simdata87$X.bdl[, 1], dlcol = simdata87$DL[1:2],
  K = 5, verbose = TRUE)
Error in if ([is.na](http://is.na/)(dlcol)) stop("The detection limit has missing values so chemical is not imputed.",  : 
  the condition has length > 1

Could you investigate the source of different detection limits? Is it different labs? If you can, having more information would make it easier to impute this dataset.

Here is what comes to mind are these possible analyses:

  1. Assume the detection limit is the maximum and let the others equal the DL. For chem1, the max DL is 5, and let ID 5 be value 3. You can impute the missing BDLs.
  2. Assume the detection limit is the minimum. For chem1, the min DL is 3, ID4 will be missing. You can impute the missing BDLs. A word of caution is that the functions will treat missing BDLs as under 3.
  3. You may want to play with leaving the other values missing (NA) in the data in either case. The missing values will be ignored in the WQS model, but the proportion of missingness should decrease. This would require some note-keeping, as the function will impute values that are missing, which may or may not want to happen.
  4. You may want to place all the BDLs in the first quartile of the WQS regression, assuming the detection limit is the maximum for each chemical. You can compare these inferences.

In short, you would need to wrangle the data somehow to use the package. Alternatively, you would need to get more information to separate out the multiple detection limits into different batches and impute by batch.

Sincerely,

Paul

phargarten2 commented 4 months ago

From: Janice Sent: Monday, May 6, 2024 11:45 Hi Paul,

Thank you very much for your thoughtful suggestions! The chemicals were analyzed in the same lab and the lab told us that they’ve provided us all the info available, so that’s no bueno for us.

I really appreciate your thoughts on this...I will explore your suggestions. Thank you...

If (a big if!) a new imputation method that can handle multiple chemicals and LODs becomes available in the near future, would it be possible to somehow import the imputed datasets and use your package to do the WQS analysis?

Wishing you a great Monday! janice

phargarten2 commented 4 months ago

From: Paul M Hargarten, Ph.D. Date: Mon, May 6, 2024 at 6:12 PM

Janice,

If a new imputation method becomes available, yes, you can still use the package. Each of the three steps—imputation, estimation, and pooling--is distinct. Therefore, even if you impute using another package, you can still analyze using WQS regression and pool the statistics. However, you may need to adjust the imputation output to be an array, following the documentation and the vignette.

Thanks, Paul

phargarten2 commented 4 months ago

PS - If the two detection limits come from different apparatuses, like NMR and GCMC, you can impute the chemical separately for each approach. Although the same approach was used with Janice's data, this suggestion may be helpful for someone else.