Investigate apparent differences in the last component of the SIT workflow when using multiprocessing and when not

ncats / multiplex-analysis-web-apps

1 stars 0 forks source link

This is probably low priority for now as the fix below is reasonable and seems to work. But, we do eventually need an understanding of precisely what's going on.

When you don't use multiprocessing, the code seems to work, but when you do, it dies with:

MicrosoftTeams-image

Can be replicated using these settings (using Poisson significance method is fine):

I spent a decent part of 2/9/24 looking into this though it probably requires deeper investigation. Here were some of my notes:

It looked like it worked on Windows but not on Linux, but I think that's because on Windows we can't run multiprocessing (I haven't fixed forkserver yet). I.e., really the error occurs for that part of the workflow because multiprocessing is selected (which it wasn't on Windows, which is probably why we don't see it on Windows). I'm pretty sure for that error, when you re-run it it went away because it saw the image directory was (at least partially) created so it didn't try to run anything more (i.e., a checkpoint was observed). But if you go to the last tab in the SIT workflow to visualize the results, I'd bet that generally there would be missing images. This is the temporary "fix":

Just saw this note to myself in my notes:

12/15/23, 3:35 AM: Note that the plotting of the density P values for each ROI over the slide spatial plot fails only on NIDAP when multiprocessing is used for the unassigned phenotypes as centers or neighbors I believe. When the unassigned phenotypes aren't involved, it works on NIDAP, even with multiprocessing. On my laptop, unassigned centers or neighbors work even when multiprocessing is used. I plan to avoid this by removing unassigned from the phenotype assignments file generation by the SIT from the phenotyper. Otherwise it's a strange issue. NIDAP Python is 3.9.18 whereas my laptop python is 3.10.8 so maybe that's the difference. Or maybe it's a start method thing? Not sure but it's strange.

It appears we have reproduced this even after we removed unassigned phenotypes from being processed in the SIT, using the input file in the screenshot above. However, there seemed to be a difference in packages on 12/15/23 as I saw the problem on NIDAP but not on my laptop. In our tests, I believe we used basically the same environment to reproduce the issue, so I think this might be a strange dependency issue (which should still be nailed down) but not an algorithmic or scientific issue.

ncats / multiplex-analysis-web-apps

Investigate apparent differences in the last component of the SIT workflow when using multiprocessing and when not #76