Closed asmagen closed 4 years ago
So the two key flags are --preprocess and --patches . --preprocess creates a zarr file in place of the png, and --patches will construct or append a SQL db. Can you change your npy file to end in _mask.npy instead of just .npy ?
I get the same output after changing the file name:
command = 'pathflowai-preprocess preprocess_pipeline \
-odb patch_information.db \
--preprocess \
--patches \
--basename ' + stainID +'/ \
--input_dir ' + PFAI_dir + ' \
--patch_size 512 \
--intensity_threshold 45. \
-tc 7 \
-t 0.05'
print(command)
os.system(command)
pathflowai-preprocess preprocess_pipeline -odb patch_information.db --preprocess --patches --basename Li63NDCLAMP/ --input_dir PFAI_inputs --patch_size 512 --intensity_threshold 45. -tc 7 -t 0.05
512
I don't see a zarr file or anything new in the directory, unless it's hidden somehow. I think this function would benefit from some text responses to the user showing what has been done and what was saved (it currently outputs just '512')
And what's #26 you mentioned here? Is it a problem with the mask orientation that I'm using here?
It’s reference to a potential bug we may need to work out from a previous patch
@jlevy44 See the pending question above about the missing outputs
Fair enough, we will add more progress updates. There should be some display though that indicates progress. You also have a forward slash near your stainID that should not be there
Great. But again, what outputs should I see right now to evaluate whether it ran and completed appropriately or not? For example, what files should I see being created? I don't see any files but I don't know what to look for. There isn't any Zarr file in the directory. @jlevy44
You should at least see outputs: https://github.com/jlevy44/PathFlowAI/blob/master/pathflowai/cli_preprocessing.py#L92
You should see outputs such as printed here: Data dump took XXX Adjust took XXX Patches took XXX
Have you adjusted your command as previously discussed?
Yes, I removed the forward slash and it’s still printing only 512. Was the package updated in the last week or two to do these things I don’t see? Maybe I don’t the the latest version.
It is possible that you do not have the latest software. As far as I am aware, this has been a long time feature of the package.
Your command syntax and print appear incorrect:
print(command)
os.system(command)
pathflowai-preprocess preprocess_pipeline -odb patch_information.db
Now I'm getting the no such command issue with preprocess:
(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project$ ls
computational_imaging computational_imaging_old ndpi_images PFAI_inputs scikit-image tissue_masks training_segmentation_images
(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project$ pathflowai-preprocess preprocess_pipeline -odb patch_information.db --preprocess --patches --basename Li63NDCLAMP --input_dir PFAI_inputs --patch_size 256 --intensity_threshold 45. -tc 7 -t 0.05
nonechucks may not work properly with this version of PyTorch (1.5.0). It has only been tested on PyTorch versions 1.0, 1.1, and 1.2
Usage: pathflowai-preprocess [OPTIONS] COMMAND [ARGS]...
Try 'pathflowai-preprocess -h' for help.
Error: No such command 'preprocess_pipeline'.
(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project$ pathflowai-preprocess --versionnonechucks may not work properly with this version of PyTorch (1.5.0). It has only been tested on PyTorch versions 1.0, 1.1, and 1.2
pathflowai-preprocess, version 0.1
The package is clearly installed and loaded on a GPU instance so what can the issue be?
Here's the image Saturn Cloud have created for me to run the PFAI environment. Is it helpful on your end to load and see what's the issue? I don't see any other way I can resolve this.
Just by looking at the YAML files, you have an old version of pathflowai specified. For the latest version of pathflowai, we recommend running:
pip install git+https://github.com/jlevy44/PathFlowAI.git
Adding within the docker container. Of course, if there are any bugs, I would highly encourage having the flexibility to rebuild the Docker within the HPC environment, or at least updating that Docker image put forth by another one housing the latest patch.
Great, thanks. It's working now. See the outputs below. Can you let mom know if its looks okay? I'mm basically running a loop obtaining slides and both region masks which I'd like to segment as well as a background mask from histoQC which I use to ask the image. I calculate the hematoxylin channel because I don't want the stain to drive the segmentation here. I mask the hematoxylin and segmentation mask matrix to remove the tissue background, save as PNG and NPY and run the preprocess per stain with flags --preprocess --patches --patch_size 512 --intensity_threshold 45. -tc 7 -t 0.05
among others. I just wanted to confirm that these are all required both when I preprocess the first stain as well as the other ones, because for each one separately I want tot generate the patches and add them to the same database. In that context, would the database be initiated only once so by now it contains all the patches I added in previous test runs and it is therefore not set up correctly? If so then how do I clear it up before running this process here? To clarify, this process is generating all the patches I need and I don't want patches Fromm previous analysis or test runs to be there. Do I just delete the db file or is there anything else to do?
Also the process crashed after about 4 slides although I allocated 32 GB mem 40 GB HD and 1 GPU. Pushing it to 64 GB mem. Does it sound reasonable just for preprocessing or a mI doing something wrong?
output:
Li63N2DCLAMP-labels.tif
(1848, 3248, 7)
ndpi_images/Li63N2DCLAMP.ndpi
ASMA01/data/imaging/liver/raw_ndpi/DCLAMP/Li63N2DCLAMP.ndpi
Created LRU Cache for 'tilesource' with 82 maximum size
Using python for large_image caching
{'levels': 9, 'sizeX': 51968, 'sizeY': 29568, 'tileWidth': 256, 'tileHeight': 256, 'magnification': 20.0, 'mm_x': 0.00044142314822989324, 'mm_y': 0.00044142314822989324}
(7392, 12992, 3)
tissue_masks/Li63N2DCLAMP.ndpi_mask_use.png
ASMA01/data/imaging/liver/tissue_masks/DCLAMP/Li63N2DCLAMP.ndpi/Li63N2DCLAMP.ndpi_mask_use.png
(1848, 3248)
(7392, 12992)
(5336, 10500, 3)
(5336, 10500, 7)
PFAI_inputs/Li63N2DCLAMP.png
pathflowai-preprocess preprocess_pipeline -odb patch_information.db --preprocess --patches --basename Li63N2DCLAMP --input_dir /home/jovyan/project/PFAI_inputs --patch_size 512 --intensity_threshold 45. -tc 7 -t 0.05
b'Data dump took 2.3092713356018066\nAdjust took 5.245208740234375e-05\nValid Patches Complete\nArea Info Complete\n ID x y patch_size ... 3 4 5 6\n0 Li63N2DCLAMP 0 0 512 ... 0.0 0.000000 0.0 0.0\n1 Li63N2DCLAMP 0 512 512 ... 0.0 0.000000 0.0 0.0\n2 Li63N2DCLAMP 0 1024 512 ... 0.0 0.000057 0.0 0.0\n3 Li63N2DCLAMP 0 1536 512 ... 0.0 0.000141 0.0 0.0\n4 Li63N2DCLAMP 0 2048 512 ... 0.0 0.000000 0.0 0.0\n.. ... ... ... ... ... ... ... ... ...\n166 Li63N2DCLAMP 9216 2048 512 ... 0.0 0.000000 0.0 0.0\n167 Li63N2DCLAMP 9216 2560 512 ... 0.0 0.000000 0.0 0.0\n168 Li63N2DCLAMP 9216 3072 512 ... 0.0 0.000000 0.0 0.0\n169 Li63N2DCLAMP 9216 3584 512 ... 0.0 0.000000 0.0 0.0\n170 Li63N2DCLAMP 9216 4096 512 ... 0.0 0.000000 0.0 0.0\n\n[171 rows x 12 columns]\nPatches took 2.634455919265747\n'
Li59TDCLAMP-labels.tif
(1344, 1680, 7)
ndpi_images/Li59TDCLAMP.ndpi
ASMA01/data/imaging/liver/raw_ndpi/DCLAMP/Li59TDCLAMP.ndpi
{'levels': 8, 'sizeX': 26880, 'sizeY': 21504, 'tileWidth': 256, 'tileHeight': 256, 'magnification': 20.0, 'mm_x': 0.00044142314822989324, 'mm_y': 0.00044142314822989324}
(5376, 6720, 3)
tissue_masks/Li59TDCLAMP.ndpi_mask_use.png
ASMA01/data/imaging/liver/tissue_masks/DCLAMP/Li59TDCLAMP.ndpi/Li59TDCLAMP.ndpi_mask_use.png
(1344, 1680)
(5376, 6720)
(3736, 4412, 3)
(3736, 4412, 7)
PFAI_inputs/Li59TDCLAMP.png
pathflowai-preprocess preprocess_pipeline -odb patch_information.db --preprocess --patches --basename Li59TDCLAMP --input_dir /home/jovyan/project/PFAI_inputs --patch_size 512 --intensity_threshold 45. -tc 7 -t 0.05
b'Data dump took 0.6201791763305664\nAdjust took 1.9073486328125e-05\nValid Patches Complete\nArea Info Complete\n ID x y patch_size ... 3 4 5 6\n0 Li59TDCLAMP 0 0 512 ... 0.0 0.000000 0.0 0.0\n1 Li59TDCLAMP 0 512 512 ... 0.0 0.000000 0.0 0.0\n2 Li59TDCLAMP 0 1024 512 ... 0.0 0.000000 0.0 0.0\n3 Li59TDCLAMP 0 1536 512 ... 0.0 0.000000 0.0 0.0\n4 Li59TDCLAMP 0 2048 512 ... 0.0 0.000000 0.0 0.0\n5 Li59TDCLAMP 512 0 512 ... 0.0 0.000000 0.0 0.0\n6 Li59TDCLAMP 512 512 512 ... 0.0 0.000000 0.0 0.0\n7 Li59TDCLAMP 512 1024 512 ... 0.0 0.000000 0.0 0.0\n8 Li59TDCLAMP 512 1536 512 ... 0.0 0.000000 0.0 0.0\n9 Li59TDCLAMP 512 2048 512 ... 0.0 0.000183 0.0 0.0\n10 Li59TDCLAMP 1024 0 512 ... 0.0 0.000000 0.0 0.0\n11 Li59TDCLAMP 1024 512 512 ... 0.0 0.000000 0.0 0.0\n12 Li59TDCLAMP 1024 1024 512 ... 0.0 0.000000 0.0 0.0\n13 Li59TDCLAMP 1024 1536 512 ... 0.0 0.000000 0.0 0.0\n14 Li59TDCLAMP 1024 2048 512 ... 0.0 0.000275 0.0 0.0\n15 Li59TDCLAMP 1536 0 512 ... 0.0 0.000069 0.0 0.0\n16 Li59TDCLAMP 1536 512 512 ... 0.0 0.000134 0.0 0.0\n17 Li59TDCLAMP 1536 1024 512 ... 0.0 0.000000 0.0 0.0\n18 Li59TDCLAMP 1536 1536 512 ... 0.0 0.000160 0.0 0.0\n19 Li59TDCLAMP 1536 2048 512 ... 0.0 0.000504 0.0 0.0\n20 Li59TDCLAMP 2048 0 512 ... 0.0 0.000172 0.0 0.0\n21 Li59TDCLAMP 2048 512 512 ... 0.0 0.000141 0.0 0.0\n22 Li59TDCLAMP 2048 1024 512 ... 0.0 0.000286 0.0 0.0\n23 Li59TDCLAMP 2048 1536 512 ... 0.0 0.000694 0.0 0.0\n24 Li59TDCLAMP 2048 2048 512 ... 0.0 0.000305 0.0 0.0\n25 Li59TDCLAMP 2560 0 512 ... 0.0 0.000042 0.0 0.0\n26 Li59TDCLAMP 2560 512 512 ... 0.0 0.000191 0.0 0.0\n27 Li59TDCLAMP 2560 1024 512 ... 0.0 0.000603 0.0 0.0\n28 Li59TDCLAMP 2560 1536 512 ... 0.0 0.000336 0.0 0.0\n29 Li59TDCLAMP 2560 2048 512 ... 0.0 0.000252 0.0 0.0\n30 Li59TDCLAMP 3072 0 512 ... 0.0 0.000164 0.0 0.0\n31 Li59TDCLAMP 3072 512 512 ... 0.0 0.000191 0.0 0.0\n32 Li59TDCLAMP 3072 1024 512 ... 0.0 0.000263 0.0 0.0\n33 Li59TDCLAMP 3072 1536 512 ... 0.0 0.000015 0.0 0.0\n34 Li59TDCLAMP 3072 2048 512 ... 0.0 0.000000 0.0 0.0\n\n[35 rows x 12 columns]\nPatches took 0.8281879425048828\n'
Li3NDCLAMP-labels.tif
Yeah, the memory utilization is likely due to running the processes in a for loop through jupyter, which Is prone to memory leaks. Typically, I would deploy each of these processes across the HPC. I would also check the resulting SQL database and make sure this is the patch size that you want. You can also add other patch sizes to capture info at a different resolution. You also need the masks in the same directory as the WSI, with the same basename, just replacing the extension with _mask.npy. You don’t want to preprocess the masks as if they were WSI. Everything else looks ok.
What do you mean by 'You don’t want to preprocess the masks as if they were WSI'? Can you clarify how do I initiate a new database and once I change gym image input strategy?
In addition to the above, I'm going to follow your advice regarding multiple patch sizes per image so I would do this:
command = 'pathflowai-preprocess preprocess_pipeline \
-odb patch_information.db \
--preprocess \
--patches \
--basename ' + stainID +' \
--input_dir ' + os.path.join(base_path,PFAI_dir) + ' \
--patch_size 512 \
--intensity_threshold 45. \
-tc 7 \
-t 0.05'
print(command)
result = subprocess.check_output(command, shell=True)
print(result)
command = 'pathflowai-preprocess preprocess_pipeline \
-odb patch_information.db \
--patches \
--basename ' + stainID +' \
--input_dir ' + os.path.join(base_path,PFAI_dir) + ' \
--patch_size 1024 \
--intensity_threshold 45. \
-tc 7 \
-t 0.05'
print(command)
result = subprocess.check_output(command, shell=True)
print(result)
Omitting the preprocess flag from the second run per slide and using 512 and then 1024 size. Other than that, how do I determine the intensity_threshold/tc/t params?
@jlevy44
That looks right to me. Do you have 7 output classes? -tc 7
Also, are you using a black background for the slides? Please convert the background to white if so.
I do have 7 segmentation (not classification, to be clear) classes including the background as the first channel: Background,Bile Ducts,Normal,Tumor,Stroma,Tissue Fold,Lymphoid Aggregate so tc is supposed to match the number of mask channels, right? Isn't it better to not requiure that parameter in that case?
Why do you need the background white? I'm using the deconvolved grayscale hematoxylin image so most of the image is black. Actually that might be the problem - maybe they're all being filtered because it doesn't exceed that intensity threshold which you may have optimized for RGB images? How do I determine that?
That info should be in the db file and can be visualized using any of our visualization functions (going to update). We remove background based on if it is white, which will be especially pertinent when we implement otsu thresholding.
You can change the threshold intensity, but it will grab the entire background if you use black, which you will have to filter out manually.
One way to set the intensity is to try to ostu threshold one of your images, then take 255-otsu_threshold as the intensity.
Hi @jlevy44
How do I know what's the status of the preprocessing procedure during and after execution and what are the outputs I should see in terms of files being written?
The following ran for a couple of seconds and just printed '512' at the end, I don't see any file inputs in the directory specified here.
Output:
And these are the input files I have in that folder (just one sample for now to test if it works):
And to make sure my plan is compatible with this function, I plan to run it as a last step in a loop that processes each slide from ndpi separately and preparing the input to PFAI. That's assuming the PFAI preprocess command will concatenate the data when it's being called on new slides. I'll just remove the '--preprocess' flag after the first iteration. In that context, everytime I run the command with --preprocess I basically instruct it to redefine the database? Does it delete the old one? And where are they stored?