statisticalbiotechnology / diffacto

Other
13 stars 8 forks source link

Imputation Algorithm Not Documented #23

Closed DarioS closed 3 years ago

DarioS commented 3 years ago

Neither the journal article nor this website gives a formulaic definition of Diffacto's imputation process. When I set -impute_threshold 1.1 to effectively turn off imputation, somehow it makes a large difference to the result matrix. On default, I get a 100% complete matrix, but with imputation turned off, I get an average of 17% missing values per protein, I wouldn't think that there's much range between 99% and 100%, but it seems there is a lot. A formula would help users understand it.

percolator commented 3 years ago

Hi Dario,

I am not sure there is much a formula would help you with. The amount of missing values are calculated per sample group, which might explain some of the differences you observe. Could it be that you have several sample groups, and in your 17% missing values there is one sample group with only missing values?

--Lukas

On Mon, May 10, 2021 at 10:00 AM Dario Strbenac @.***> wrote:

Neither the journal article nor this website gives a formulaic definition of Diffacto's imputation process. When I set -impute_threshold 1.1 to effectively turn off imputation, somehow it makes a large difference to the result matrix. On default, I get a 100% complete matrix, but with imputation turned off, I get an average of 17% missing values per protein, I wouldn't think that there's much range between 99% and 100%, but it seems there is a lot. A formula would help users understand it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statisticalbiotechnology/diffacto/issues/23, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAXKAHGZFMJ4DKKNJ66OYLTM6HCLANCNFSM44QREXYQ .

DarioS commented 3 years ago

I have 70 patients and 140 samples (two technical replicates each). I should use -samples to group them. It makes sense that some patients don't express some proteins. I was wrongly thinking a group was a set of different samples for a particular peptide.

percolator commented 3 years ago

That makes sense. Thanks! --Lukas

On Wed, May 12, 2021 at 4:00 AM Dario Strbenac @.***> wrote:

I have 70 patients and 140 samples (two technical replicates each). I should use -samples to group them. It makes sense that some patients don't express some proteins. I was wrongly thinking a group was a set of different samples for a particular peptide.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statisticalbiotechnology/diffacto/issues/23#issuecomment-839374486, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAXKAEKNIISEEJRXGE4ON3TNHOLXANCNFSM44QREXYQ .