rgcca-factory / RGCCA

https://rgcca-factory.github.io/RGCCA/
10 stars 11 forks source link

Bug with function rgcca() #9

Closed Ombel88 closed 2 years ago

Ombel88 commented 3 years ago

Hi,

I work on metabolomics data ( 328 metabolites which are including in 21 pathways) and the antibody response after Yellow Fever Vaccine. I want to explore the association between metabolomics data and the antibody response. So, I would like to see the relationships between metabolic pathways and the antibody response using the RGCCA method.

I have got a list with 22 blocks ( 21 metabolic pathways/blocks and one block with one variable corresponding to the antibody response). Note that some pathways include only one variable.

I just try to run the function rgcca() with some "default" parameters (0<tau<1, no superblock, scheme centroïd) just as an example. And, an error message appears : image

This message disappears and the function works when I remove all the blocks including only one variable. I thought the problem could come from the functions scaling() and/or scale2() used in the rgcca() function. image

Let me know if you need some complementary informations or something is not clear.

I hope you could give me an help.

Thanks in advance,

NB : When I used the old version (2017), the function rgcca() worked on the same data.

Tenenhaus commented 3 years ago

Dear Ombel88, Thank you very much for your interest in RGCCA ! Please could you use the latest version of the RGCCA package located on the branch 'CRAN' (not on the master, sorry) Also be aware that you have missing values in your dataset and that you should use the nipals method instead of "complete". Indeed, with the "complete" method it seems you have less than 3 observations in the intersection... Keep in touch Arthur

Ombel88 commented 3 years ago

Dear Mr Tenenhaus,

Thank you so much for your quick reply. It was very helpful and I am glad to tell you that the function run well now.

I would like to ask you one question about the application of the method on my study's data (maybe it is not the best place here, sorry)

I have read the articles you wrote about these methods in order to improve my understanding on the RGCCA and SGCCA methods. I also read the Garali's article where the SGCCA method is applied to identify biomarkers in spinocerebellar ataxia. I thought the CPCA structure used in this article could answer my question which is to explore the association between metabolomics data and the antibody response after the Yellow Fever vaccination. So, my superblock are the concatenation of the 20 metabolic pathways/blocks and it is connected to the antibody response. In this way, I chose the following shrinkage parameters. For the metabolic blocks and the superblock, I chose Mode A (tau = 1) and Mode B (tau=0) respectively. But I am wondering what value could the block “Antibody response” take. First, I tried with Mode A because I wanted to emphasize the between-block relation (antibody response well connected to the superblock), then with Mode B but it seems to have no change on the results. This could it be explained because there is only one variable within the antibody response block ? Or, I made a mistake somewhere.

Hope I am clear,

Best regards,

GFabien commented 3 years ago

Hi @Ombel88, sorry for the too late response. In case you are still wondering about this, let me try to give you an explanation.

If I understood correctly, you have a block "Antibody response" that has only one variable and you see the same results whether you set the value of tau to 0 or 1.

It is normal as the constraint on the weight vector a is t(a) %*% M %*% a where M = tau * diag(nrow(X)) + (1 - tau)*crossprod(X) / nrow(X) and X is your data block. When tau = 0, M is simply the empirical covariance of X which is a scalar in your case that has been set to 1 if X has been scaled (default behaviour). When tau = 1, M is the identity matrix of dimension 1 x 1 in your case so it is again the scalar 1 and both cases result in the same constraint, hence the same result.

I hope this explanation is clear,

Best regards,