Closed camilafarias112 closed 3 years ago
Hello Camila,
In order to be able to look into this issue we would appreciate to have a reproducible example to be able to solve it swiftly. If it's possible please ananoymize your data, alter all the values and send it next to the previous pipeline.
If that would not be possible I will need some collaboration from your side. I have extracted the code that is run inside the standardize()
function.
It looks like the error you are getting comes from the last line of this chunk, the cbind()
function expects two matrices of equal length, and it appears not to be the case. Please report back the results of dim(dd)
, dim(t(assayData(object)[["exp"]][select.no, ]))
and length(select.no)
.
Hopefully this information will shed some light on what is causing the problem.
object <- exp_std_down
select <- exposureNames(object)
select.no <- exposureNames(object)[!exposureNames(object) %in% select]
if(sum(fData(object)[select, ".type"] == "factor") != 0) {
if(warnings) {
warning("Given categorical exposures.")
}
select.no <- c(select.no,
select[fData(object)[select, ".type"] == "factor"]
)
select <- select[
fData(object)[select, ".type"] != "factor"
]
}
dd <- expos(object)[ , select, drop=FALSE]
center <- apply(dd, 2, mean, na.rm = na.rm)
vari <- apply(dd, 2, sd, na.rm = na.rm)
dd <- apply(dd, 2, function(x) as.numeric(as.character(x)))
dd <- scale(dd, center = center, scale = vari)
dd <- cbind(dd,
t(assayData(object)[["exp"]][select.no, ]))
Xavier Escribà Montagut
Hello Xavier,
Thank you for the quick reply. Before I send you a reproducible example, I believe I found what is different from my previous successful analysis and this one. In exp_down object, 1 out of 761 exposures was classified as a categorical variable (see pic attached previously). And I can see this in the "exposures description" when investigating the ExposomeSet object.
Would you know a way that I could extract the information of which one of my 761 was classified as categorical? Just having a hard time finding this abnormality in my exposures input. I know that should not have categorical exposures. It all should be continuous.
I see in the error message that categorical variables in the exposure object won't be standardized, and maybe this 1 exposure out is not matching with the description matrix (maybe).
Again: thank you!
Camila
Hi Xavier,
I found the issue. One of my exposures "KRT28" was considered a factor instead of numeric when creating the ExposomeSet object. Here you will see the values for each sample of mine (rows).
And I found this by extracting with: cbind(exp_down@featureData@data[[".type"]],rownames(exp_down@assayData[["exp"]]))
I was expecting this exposure to be continuous, even though the distribution was poor. It was interesting that when investigating the normality with nm_down <- normalityTest(exp_down)
, only the 760 other exposures were tested. And that's why I couldn't figure out which one of the exposures was the problem.
The standardize() function could not succeed because I had KRT28 in my phenotype input, but it was not tested for normality. This exposure didn't have TRUE/FALSE, so therefore the function didn't work.
= total of 760, not 761.
When I absolutely removed KRT28 from the entire pipeline, it all worked.
I hope I could explain it enough, please let me know if that helps. Thank you so much for sending the description of the function!
All the best,
Camila
Hello Camila,
Some remarks regarding your inputs and questions:
Biobase::fData(exp_std_down)
. This will yield a table such as: Family Name .fct .trn .std .imp .type
AbsPM25 Air Pollutants Measurement of the blackness of PM2.5 filters numeric
As Metals Asenic numeric
BDE100 PBDEs Polybrominated diphenyl ether -100 numeric
On which you will be able to see the type of each of your exposures.
exposures.asFactor
of the function rexposome::loadExposome
(or rexposome::readExposome
). The definition of this argument (extracted from the documentation) is: (default 5
) The exposures with more than this number of unique items will be considered as "continuous" while the exposures with less or equal number of items will be considered as "factor". Maybe tuning this argument could also solve your problem without the need of removing this exposure.rexposome::standardize
, it should work perfectly but only modify the numerical exposures (as one may expect!). An example about that using the test data (4 categorical exposures and 84 continuous exposures) bundled with the package is the following:library(rexposome)
path <- file.path(path.package("rexposome"), "extdata")
description <- file.path(path, "description.csv")
phenotype <- file.path(path, "phenotypes.csv")
exposures <- file.path(path, "exposures.csv")
exp <- readExposome(exposures = exposures, description = description, phenotype = phenotype,
exposures.samCol = "idnum", description.expCol = "Exposure", description.famCol = "Family",
phenotype.samCol = "idnum")
standardize(exp, method = "normal")
For that reason, I would value a lot if you could take the time to anonymize completely your data and send it to me so I can perform a test, there may be some bug I'm missing as you should not have any problem running the pipeline with a categorical variable.
Nevertheless, I will close the issue for now as I see you are able to continue with your analysis.
Thanks for your report,
Xavier.
@camilafarias112 There is no need anymore for you to send data for testing purposes. I found out that the actual issue is the fact that there is only ONE categorical exposure, it works perfectly when more than one is present. I already solved that on the latest commit 1f5d1ac. I will upload this fix to Bioconductor tomorrow.
Thanks for pointing out the issue and helping to solve it!
Xavier.
That is absolutely great! Happy to help, I'm a big fan and user of your package.
Thank you for all the help. All the best, Camila.
Hello developers,
I have using rexposome a lot in my projects, so in advance, I would like to express how useful is your workflow. However, I got stuck recently in the standardize() step for one of my analyses, and I haven't had this issue before. Would you guys have a clue of what can be happening?
I have the Robject exp_down
With inputs of dimensions:
the exp_down is post-imputation step, which I've already investigated.
When I:
exp_std_down <- standardize(exp_down, method = "normal")
I get the following error message:What are the matrices that the error is referring to?
I would very much appreciate the help. Thank you!
Camila Farias Amorim