swolock / scrublet

Detect doublets in single-cell RNA-seq data
MIT License
138 stars 73 forks source link

SVD instead of PCA #6

Open DanSchnell opened 5 years ago

DanSchnell commented 5 years ago

Hi Stuart, The CellSystems paper, Default Preprocessing, notes that SVD was used in place of PCA for the Demuxlet example. Is there an option in the scrub_doublets or other function that will make that change? If not, is there some work-around code available from the run(s) done for the paper? Thanks much, Dan

swolock commented 5 years ago

Hi Dan,

TruncatedSVD is automatically used in place of PCA when you supply mean_center=False for scrub_doublets().

Here's the relevant section of the docs:

        log_transform : bool, optional (default: False)
            If True, log-transform the counts matrix (log10(1+TPM)). 
            `sklearn.decomposition.TruncatedSVD` will be used for dimensionality
            reduction, unless `mean_center` is True.
        mean_center : bool, optional (default: True)
            If True, center the data such that each gene has a mean of 0.
            `sklearn.decomposition.PCA` will be used for dimensionality
            reduction.
        normalize_variance : bool, optional (default: True)
            If True, normalize the data such that each gene has a variance of 1.
            `sklearn.decomposition.TruncatedSVD` will be used for dimensionality
            reduction, unless `mean_center` is True.

Sam

DanSchnell commented 5 years ago

Thanks very much Sam! Dan

From: Sam Wolock notifications@github.com Reply-To: swolock/scrublet reply@reply.github.com Date: Tuesday, May 21, 2019 at 4:37 PM To: swolock/scrublet scrublet@noreply.github.com Cc: "Schnell, Daniel (Dan)" Daniel.Schnell@cchmc.org, Author author@noreply.github.com Subject: Re: [swolock/scrublet] SVD instead of PCA (#6)

Hi Dan,

TruncatedSVD is automatically used in place of PCA when you supply mean_center=False for scrub_doublets().

Here's the relevant section of the docs:

    log_transform : bool, optional (default: False)

        If True, log-transform the counts matrix (log10(1+TPM)).

        `sklearn.decomposition.TruncatedSVD` will be used for dimensionality

        reduction, unless `mean_center` is True.

    mean_center : bool, optional (default: True)

        If True, center the data such that each gene has a mean of 0.

        `sklearn.decomposition.PCA` will be used for dimensionality

        reduction.

    normalize_variance : bool, optional (default: True)

        If True, normalize the data such that each gene has a variance of 1.

        `sklearn.decomposition.TruncatedSVD` will be used for dimensionality

        reduction, unless `mean_center` is True.

Sam

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/swolock/scrublet/issues/6?email_source=notifications&email_token=AJIX34NVYGMHLLVGLZ3MDFTPWRMR5A5CNFSM4HOOBKRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV5DPOQ#issuecomment-494548922, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJIX34NBVE2NEVSVPWLYZHTPWRMR5ANCNFSM4HOOBKRA.