Closed scottgigante-immunai closed 1 year ago
Is the number of test epoch determined here? https://github.com/openproblems-bio/openproblems/blob/4ac6faf34507ac9105f02ef7f83eceea556881ea/openproblems/tasks/spatial_decomposition/methods/cell2location.py#L38-L41
What does it mean that the same name appears multiple times in the report? E.g. search "spatial_decomposition:cell2location_detection_alpha_20_nb-tabula_muris_senis_alpha_1:openproblems-python-extras": One process took 14 min while another run took 1h. How are these different?
A reason for 15 minutes instead of 2 minutes could be that computing num_samples=10
on full data on CPU takes time. I can suggest reducing num_samples=2
. Why would this test take 1h I don't know. I need to know more about why there are many processes with the same name.
I will look into which variables are sampled and if any of them are large (re issue below).
At the moment amortised version uses too much memory (https://github.com/scverse/scvi-tools/issues/1801) and I am working on a solution. However, this doesn't explain this issue because amortised version is not slower here.
Many processes with the same name is due to failures. It will fail either because it runs out of memory (in which case it gets rebooted with more memory) ir because it hits an hour (in which case it gets rebooted with more cpus). num_samples=2 sounds good to me.
On Wed, Nov 30, 2022 at 4:02 AM Vitalii Kleshchevnikov < @.***> wrote:
A reason for 15 minutes instead of 2 minutes could be that computing num_samples=10 on full data on CPU takes time. I can suggest reducing num_samples=2. Why would this test take 1h I don't know. I need to know more about why there are many processes with the same name.
I will look into which variables are sampled and if any of them are large (re issue below).
At the moment amortised version uses too much memory ( scverse/scvi-tools#1801 https://protect.checkpoint.com/v2/___https://github.com/scverse/scvi-tools/issues/1801___.YzJlOmltbXVuYWk6YzpnOjZlMTA5ZjhhMWUyN2VlMjc4MjUzOTRhYjg1YmJlNGYwOjY6NTkxNTozZWM2ZjJmMWJkOGIwMzJkYTMyM2YxYjFlNjNkNWRkNjkxNjBmOTdjMGNjOTNhY2VhZmJiM2U2YmNhYTZmNjIzOmg6VA) and I am working on a solution. However, this doesn't explain this issue because amortised version is not slower here.
— Reply to this email directly, view it on GitHub https://protect.checkpoint.com/v2/___https://github.com/openproblems-bio/openproblems/issues/717%23issuecomment-1331835448___.YzJlOmltbXVuYWk6YzpnOjZlMTA5ZjhhMWUyN2VlMjc4MjUzOTRhYjg1YmJlNGYwOjY6ZGUwYjphOWFiNzBmODk0ZGM2OGM4M2EwNjY4OTA5NDU0M2UxZmFkNjBjMzUxNjM0M2IzMTM5YTY4YzIxMDVhNDEzN2I1Omg6VA, or unsubscribe https://protect.checkpoint.com/v2/___https://github.com/notifications/unsubscribe-auth/AUHCMAQDPXLDIVFRXRJHCALWK4JZNANCNFSM6AAAAAASO4CNYM___.YzJlOmltbXVuYWk6YzpnOjZlMTA5ZjhhMWUyN2VlMjc4MjUzOTRhYjg1YmJlNGYwOjY6MTJjZTo1YTJiNDM5MmYzNmJhZmFlZjFlOWVhODMzNDE5YjY4NmRiMDNmZGVjYjhlMzhiOTM1YzE3Yjg0YzAzZjQ0ZjVkOmg6VA . You are receiving this because you authored the thread.Message ID: @.***>
-- PLEASE NOTE: The information contained in this message is privileged and confidential, and is intended only for the use of the individual to whom it is addressed and others who have been specifically authorized to receive it. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, or if any problems occur with the transmission, please contact the sender.
Maybe the test can subset the data to a smaller number of genes/cells/locations or simulate fewer locations? That would speed up tests for all methods. Cell2location package does a very similar test on random data which runs in 30 seconds (https://github.com/BayraktarLab/cell2location/actions/runs/3554709736/jobs/5970971306).
Did num_samples=2 actually help?
The test data is by definition 500 cells, though I wonder if we should also have fewer spots in the test dataset?
Looks like reducing num_samples did cut the runtime of the successful run, but there are still a number of runs that went the full hour. This might be a problem with nextflow.
On Fri, 2 Dec 2022, 3:54 am Vitalii Kleshchevnikov, < @.***> wrote:
Did num_samples=2 actually help?
— Reply to this email directly, view it on GitHub https://protect.checkpoint.com/v2/___https://github.com/openproblems-bio/openproblems/issues/717%23issuecomment-1334928802___.YzJlOmltbXVuYWk6YzpnOmY4YTA1NzEyMDVmY2UxYmFjZTE4MTlhZTQ5MmYzZGMwOjY6NGQyYjo5OTQ4MGY5YjhhODRkMGQzZmRmZjJkZTMzYzM2YTM0MjU2NGNkZDI1NzE1NjkwMDVlN2QwZjU5OTgwMmQyZWJmOmg6VA, or unsubscribe https://protect.checkpoint.com/v2/___https://github.com/notifications/unsubscribe-auth/AUHCMAT5S4K27RGVY6UVD4TWLG2KLANCNFSM6AAAAAASO4CNYM___.YzJlOmltbXVuYWk6YzpnOmY4YTA1NzEyMDVmY2UxYmFjZTE4MTlhZTQ5MmYzZGMwOjY6NzE5NDowODNhYWU3OTE0NjA0ZDdmN2VhNzMzZTUwMmE0NjgzMWFiMDk1MGViOTFkYWM0YzI0NjBmZDllZjdmZWRkN2UzOmg6VA . You are receiving this because you modified the open/close state.Message ID: @.***>
-- PLEASE NOTE: The information contained in this message is privileged and confidential, and is intended only for the use of the individual to whom it is addressed and others who have been specifically authorized to receive it. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, or if any problems occur with the transmission, please contact the sender.
Per the execution timeline, cell2location takes a long time to run even when
test=True
. This is costing us money. @vitkl can you look into this?