stephenslab / flashr

R package for Empirical Bayes Factor Analysis.
https://stephenslab.github.io/flashr
BSD 3-Clause "New" or "Revised" License
49 stars 11 forks source link

Sparse input data #102

Open ml3958 opened 3 years ago

ml3958 commented 3 years ago

Hi,

I have a sparse input matrix and I tried to do flashr (with non negative constrains on both F and L matrix), however I only get very few factors. I wonder is this has something to do with the sparsity in my input data (53% zeros), and what would be the best practice?

Thanks so much in advance!

stephens999 commented 3 years ago

First, for sparse matrices, I recommend our improved flashr at https://github.com/willwerscheid/flashier It should be faster for sparse matrices. It is also a bit more "in development" right now, but we are actively working on it, so can provide advice etc there.

However, although in principle flashr and flashier can do EBMF with non-negative priors, i would not really recommend them (whether the data are sparse or not). In particular we know that convergence can be an issue with non-negative priors.

My recommendation, if this is a sparse count matrix and you want a non-negative factorization, would be to use Poisson non-negative matrix factorization, as in https://github.com/stephenslab/fastTopics , where we have worked much harder on good convergence, and also the count nature of the data is better modeled.

We are also working on semi-nonnegative approaches in flashier (where loadings are non-negative and factors are not) and these might also be of interest (but continue the conversation on that in the flashier repo if you are interested)

ml3958 commented 3 years ago

Thanks so much for your reply. I will look into flashier.

In terms of what you mentioned:

"However, although in principle flashr and flashier can do EBMF with non-negative priors, i would not really recommend them (whether the data are sparse or not). In particular we know that convergence can be an issue with non-negative priors."

I wonder would such difficulty in convergence lead to less factors than expected? I ran flashr with nonnegative priors multiple times the results are pretty consistent (few factors, but very reproducible)- so I suspect in my case convergence was not an issue. However, I do have a very complex dataset and I expect many factors.

I didn't consider fastTopics because my input is not a count readout, but rather normalized statistics.

Thank you!

stephens999 commented 3 years ago

yes, bottom line is that convergence difficulties could lead to underfitting of the right number of factors. Maybe try for comparison another package for Non-negative matrix factorization like nnlm?

On Tue, Sep 21, 2021 at 4:50 PM Menghan Liu @.***> wrote:

Thanks so much for your reply. I will look into flashier.

In terms of what you mentioned:

"However, although in principle flashr and flashier can do EBMF with non-negative priors, i would not really recommend them (whether the data are sparse or not). In particular we know that convergence can be an issue with non-negative priors."

I wonder would such difficulty in convergence lead to less factors than expected? I ran flash with nonnegative priors multiple times the results are pretty consistent (few factors, but very reproducible)- so I suspect in my case convergence was not an issue. However, I do have a very complex dataset and I expect many factors.

I didn't consider fastTopics because my input is not a count readout, but rather normalized statistics.

Thank you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stephenslab/flashr/issues/102#issuecomment-924412634, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRLP2D6RUKI3KOEHHHLUDD43PANCNFSM5EOVWIYA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.