zamanianlab / wrmXpress

MIT License
3 stars 1 forks source link

Deletion of input images that aren't selected to be analyzed. #18

Open wheelern opened 9 months ago

wheelern commented 9 months ago

Describe the bug The algorithm allows for selection of specific wells to analyze, instead of the entire dataset (i.e., by using the wells option in the config YAML):

https://github.com/zamanianlab/wrmXpress/blob/810989c45ce7fe0f2b2722b2634a39af98a7626f/local_env/parameters_template.yml#L92-L94

This feature is mainly used in development to speed up testing times; I'm not aware of it being used at all in production settings.

There are two issues with this feature:

  1. The steps to run Cellpose includes renaming all the images (sidenote: this won't be necessary if Cellpose is updated in the Docker image, as the IO bug was fixed as of v2.1.1, see here for details) in the input folder using a glob pattern and then running the command on the entire direction:

    https://github.com/zamanianlab/wrmXpress/blob/810989c45ce7fe0f2b2722b2634a39af98a7626f/wrapper.py#L91-L103

    In order to not run Cellpose on images that the user doesn't want analyzed, the non-selected input images are deleted:

    https://github.com/zamanianlab/wrmXpress/blob/810989c45ce7fe0f2b2722b2634a39af98a7626f/wrapper.py#L63-L76

    This is a hacky workaround for using Cellpose, and it's fine when the analysis is running remotely and deletion of input files isn't a problem. When run locally, where the input folder might include the only copy of the raw image data, this is serious problem and could result in data loss.

  2. There is a bug that will throw an image not found error if only a single well is selected. This is caused by this line:

    https://github.com/zamanianlab/wrmXpress/blob/810989c45ce7fe0f2b2722b2634a39af98a7626f/wrapper.py#L139

    The zip() function makes two sets iterable; this approach works fine if both wells and plate_paths are lists of the same length. If they are strings (which is the case when only one well is selected), it splits the string into individual characters - ['A', '0', '1'] for well, for example.

    A quick solution to this would be to make sure wells and plate_paths are lists, and convert them if not.

    The error gets thrown whenever the first image is loaded by the chosen modules, for example here when analyzing images of microfilariae:
    https://github.com/zamanianlab/wrmXpress/blob/810989c45ce7fe0f2b2722b2634a39af98a7626f/modules/segment_worms.py#L118

To Reproduce Run wrmXpress locally using the attached YAML and sample data.

20210819-p01-NJW_753.txt 20210819-p01-NJW_753.zip