openproblems-bio / openproblems

Formalizing and benchmarking open problems in single-cell genomics
MIT License
290 stars 77 forks source link

scanvi runs over an hour in test mode #718

Closed scottgigante-immunai closed 1 year ago

scottgigante-immunai commented 1 year ago

Per the execution timeline, scanvi takes a long time to run even when test=True. This is costing us money. @danielStrobl can you look into this?

rcannood commented 1 year ago

In my opinion, this is an issue inherent in openproblems-v1 and not so much scanvi.

A much more important optimisation that could be made for openproblems is not having to rerun all tasks when wanting to update the results from just one task. That would save us a lot of money.

For reference, in openproblems-v2, a small data set is used to test methods. The scanvi unit test, for instance, only takes 7 minutes to run and costs us no money: https://github.com/openproblems-bio/openproblems-v2/actions/runs/3583819208/jobs/6029726080.

scottgigante-immunai commented 1 year ago

Sure, but right now v2 isn't ready (: and the current spec requires a test mode that runs on 500 cells, 500 genes, and in the case of scanvi, 2 epochs. This shouldn't take an hour.

On Thu, 1 Dec 2022, 2:51 am Robrecht Cannoodt, @.***> wrote:

In my opinion, this is an issue inherent in openproblems-v1 and not so much scanvi.

A much more important optimisation that could be made for openproblems is not having to rerun all tasks when wanting to update the results from just one task. That would save us a lot of money.

For reference, in openproblems-v2, a small data set is used to test methods. The scanvi unit test, for instance, only takes 7 minutes to run and costs us no money: https://github.com/openproblems-bio/openproblems-v2/actions/runs/3583819208/jobs/6029726080 https://protect.checkpoint.com/v2/___https://github.com/openproblems-bio/openproblems-v2/actions/runs/3583819208/jobs/6029726080___.YzJlOmltbXVuYWk6YzpnOmU0YzNhNDI0NzhiNjA2ZTNkNDFhOWUzMGVjZTY1ZDA0OjY6YjgwODpiMmRhYTBkNDk0NmIzMDIxOTRkM2Y2ZTg2MDcyZDJiNzM4Y2I0NzUyNTZhYjBiN2RmMGNmNzlmNDY3M2FiYjQ2Omg6VA .

— Reply to this email directly, view it on GitHub https://protect.checkpoint.com/v2/___https://github.com/openproblems-bio/openproblems/issues/718%23issuecomment-1333343056___.YzJlOmltbXVuYWk6YzpnOmU0YzNhNDI0NzhiNjA2ZTNkNDFhOWUzMGVjZTY1ZDA0OjY6M2IyNToyNjFjNGJlZmU0ZGE3NTZmOTM5MzE0NjE2ZmYwOTNiZWYzMjg1ZDQxMzQ0MjFmZDM3MzdlM2IwZWMxMTJmM2M5Omg6VA, or unsubscribe https://protect.checkpoint.com/v2/___https://github.com/notifications/unsubscribe-auth/AUHCMATZI63XCV2UGXSCPV3WLBKGJANCNFSM6AAAAAASO4DFBQ___.YzJlOmltbXVuYWk6YzpnOmU0YzNhNDI0NzhiNjA2ZTNkNDFhOWUzMGVjZTY1ZDA0OjY6NDljZjoyOTJiYzk3Mzc1YWQ1YjNmNjJkYzkzZTlhNzQ5MzdkM2U3MzRiNmVmNWI5YjNkMzkwNGViMjhiOWE2NmFlOGY3Omg6VA . You are receiving this because you authored the thread.Message ID: @.***>

-- PLEASE NOTE: The information contained in this message is privileged and confidential, and is intended only for the use of the individual to whom it is addressed and others who have been specifically authorized to receive it. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, or if any problems occur with the transmission, please contact the sender.

LuckyMD commented 1 year ago

Could it be that the scANVI function uses scVI pretraining and doesn't adapt those parameters in test mode @danielStrobl ?

scottgigante-immunai commented 1 year ago

This appears to be resolved in the latest version.