openproblems-bio / openproblems

Formalizing and benchmarking open problems in single-cell genomics
MIT License
290 stars 77 forks source link

cell2location runs over an hour in test mode #717

Closed scottgigante-immunai closed 1 year ago

scottgigante-immunai commented 1 year ago

Per the execution timeline, cell2location takes a long time to run even when test=True. This is costing us money. @vitkl can you look into this?

vitkl commented 1 year ago

Is the number of test epoch determined here? https://github.com/openproblems-bio/openproblems/blob/4ac6faf34507ac9105f02ef7f83eceea556881ea/openproblems/tasks/spatial_decomposition/methods/cell2location.py#L38-L41

vitkl commented 1 year ago

What does it mean that the same name appears multiple times in the report? E.g. search "spatial_decomposition:cell2location_detection_alpha_20_nb-tabula_muris_senis_alpha_1:openproblems-python-extras": One process took 14 min while another run took 1h. How are these different?

vitkl commented 1 year ago

A reason for 15 minutes instead of 2 minutes could be that computing num_samples=10 on full data on CPU takes time. I can suggest reducing num_samples=2. Why would this test take 1h I don't know. I need to know more about why there are many processes with the same name.

I will look into which variables are sampled and if any of them are large (re issue below).

At the moment amortised version uses too much memory (https://github.com/scverse/scvi-tools/issues/1801) and I am working on a solution. However, this doesn't explain this issue because amortised version is not slower here.

scottgigante-immunai commented 1 year ago

Many processes with the same name is due to failures. It will fail either because it runs out of memory (in which case it gets rebooted with more memory) ir because it hits an hour (in which case it gets rebooted with more cpus). num_samples=2 sounds good to me.

On Wed, Nov 30, 2022 at 4:02 AM Vitalii Kleshchevnikov < @.***> wrote:

A reason for 15 minutes instead of 2 minutes could be that computing num_samples=10 on full data on CPU takes time. I can suggest reducing num_samples=2. Why would this test take 1h I don't know. I need to know more about why there are many processes with the same name.

I will look into which variables are sampled and if any of them are large (re issue below).

At the moment amortised version uses too much memory ( scverse/scvi-tools#1801 https://protect.checkpoint.com/v2/___https://github.com/scverse/scvi-tools/issues/1801___.YzJlOmltbXVuYWk6YzpnOjZlMTA5ZjhhMWUyN2VlMjc4MjUzOTRhYjg1YmJlNGYwOjY6NTkxNTozZWM2ZjJmMWJkOGIwMzJkYTMyM2YxYjFlNjNkNWRkNjkxNjBmOTdjMGNjOTNhY2VhZmJiM2U2YmNhYTZmNjIzOmg6VA) and I am working on a solution. However, this doesn't explain this issue because amortised version is not slower here.

— Reply to this email directly, view it on GitHub https://protect.checkpoint.com/v2/___https://github.com/openproblems-bio/openproblems/issues/717%23issuecomment-1331835448___.YzJlOmltbXVuYWk6YzpnOjZlMTA5ZjhhMWUyN2VlMjc4MjUzOTRhYjg1YmJlNGYwOjY6ZGUwYjphOWFiNzBmODk0ZGM2OGM4M2EwNjY4OTA5NDU0M2UxZmFkNjBjMzUxNjM0M2IzMTM5YTY4YzIxMDVhNDEzN2I1Omg6VA, or unsubscribe https://protect.checkpoint.com/v2/___https://github.com/notifications/unsubscribe-auth/AUHCMAQDPXLDIVFRXRJHCALWK4JZNANCNFSM6AAAAAASO4CNYM___.YzJlOmltbXVuYWk6YzpnOjZlMTA5ZjhhMWUyN2VlMjc4MjUzOTRhYjg1YmJlNGYwOjY6MTJjZTo1YTJiNDM5MmYzNmJhZmFlZjFlOWVhODMzNDE5YjY4NmRiMDNmZGVjYjhlMzhiOTM1YzE3Yjg0YzAzZjQ0ZjVkOmg6VA . You are receiving this because you authored the thread.Message ID: @.***>

-- PLEASE NOTE: The information contained in this message is privileged and confidential, and is intended only for the use of the individual to whom it is addressed and others who have been specifically authorized to receive it. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, or if any problems occur with the transmission, please contact the sender.

vitkl commented 1 year ago

Maybe the test can subset the data to a smaller number of genes/cells/locations or simulate fewer locations? That would speed up tests for all methods. Cell2location package does a very similar test on random data which runs in 30 seconds (https://github.com/BayraktarLab/cell2location/actions/runs/3554709736/jobs/5970971306).

vitkl commented 1 year ago

Did num_samples=2 actually help?

scottgigante-immunai commented 1 year ago

The test data is by definition 500 cells, though I wonder if we should also have fewer spots in the test dataset?

Looks like reducing num_samples did cut the runtime of the successful run, but there are still a number of runs that went the full hour. This might be a problem with nextflow.

On Fri, 2 Dec 2022, 3:54 am Vitalii Kleshchevnikov, < @.***> wrote:

Did num_samples=2 actually help?

— Reply to this email directly, view it on GitHub https://protect.checkpoint.com/v2/___https://github.com/openproblems-bio/openproblems/issues/717%23issuecomment-1334928802___.YzJlOmltbXVuYWk6YzpnOmY4YTA1NzEyMDVmY2UxYmFjZTE4MTlhZTQ5MmYzZGMwOjY6NGQyYjo5OTQ4MGY5YjhhODRkMGQzZmRmZjJkZTMzYzM2YTM0MjU2NGNkZDI1NzE1NjkwMDVlN2QwZjU5OTgwMmQyZWJmOmg6VA, or unsubscribe https://protect.checkpoint.com/v2/___https://github.com/notifications/unsubscribe-auth/AUHCMAT5S4K27RGVY6UVD4TWLG2KLANCNFSM6AAAAAASO4CNYM___.YzJlOmltbXVuYWk6YzpnOmY4YTA1NzEyMDVmY2UxYmFjZTE4MTlhZTQ5MmYzZGMwOjY6NzE5NDowODNhYWU3OTE0NjA0ZDdmN2VhNzMzZTUwMmE0NjgzMWFiMDk1MGViOTFkYWM0YzI0NjBmZDllZjdmZWRkN2UzOmg6VA . You are receiving this because you modified the open/close state.Message ID: @.***>

-- PLEASE NOTE: The information contained in this message is privileged and confidential, and is intended only for the use of the individual to whom it is addressed and others who have been specifically authorized to receive it. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, or if any problems occur with the transmission, please contact the sender.