jlevy44 / PathFlowAI

A High-Throughput Workflow for Preprocessing, Deep Learning Analytics and Interpretation in Digital Pathology
https://jlevy44.github.io/PathFlowAI/
MIT License
39 stars 8 forks source link

Issues with segmentation training #27

Closed asmagen closed 4 years ago

asmagen commented 4 years ago

After successfully running the processing, assuming your response to the other thread is that it's alright, I got into the following problem with running the training:

!CUDA_VISIBLE_DEVICES=0 pathflowai-train_model train_model --prediction --patch_size 512 -pr 224 --save_location outcomes_model.pkl -a resnet34 --input_dir /project/PFAI_inputs -nt 1 -t 10000 -lr 1e-4 -ne 10 -ss 0.5 -ssv 0.3 -tt 0.1 -bt 0.01 -imb -pi patch_information.db -bs 32 -ca
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
    Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.

So based on the following discussion I ran export MKL_SERVICE_FORCE_INTEL=1 and the following but got an error:

(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project$ CUDA_VISIBLE_DEVICES=0 pathflowai-train_model train_model --prediction --patch_size 512 -pr 224 --save_location outcomes_model.pkl -a resnet34 --input_dir /project/PFAI_inputs/ -nt 1 -t 10000 -lr 1e-4 -ne 10 -ss 0.5 -ssv 0.3 -tt 0.1 -bt 0.01 -imb -pi patch_information.db -bs 32 -ca
nonechucks may not work properly with this version of PyTorch (1.5.0). It has only been tested on PyTorch versions 1.0, 1.1, and 1.2
/srv/conda/envs/saturn/lib/python3.6/site-packages/pathflowai/utils.py:605: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

()
Traceback (most recent call last):
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/fsspec/mapping.py", line 76, in __getitem__
    result = self.fs.cat(k)
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/fsspec/spec.py", line 587, in cat
    return self.open(path, "rb").read()
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/fsspec/spec.py", line 774, in open
    **kwargs
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/fsspec/implementations/local.py", line 108, in _open
    return LocalFileOpener(path, mode, fs=self, **kwargs)
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/fsspec/implementations/local.py", line 175, in __init__
    self._open()
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/fsspec/implementations/local.py", line 180, in _open
    self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: '/project/PFAI_inputs/Li63N2DCLAMP.zarr/.zarray'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/zarr/core.py", line 150, in _load_metadata_nosync
    meta_bytes = self._store[mkey]
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/fsspec/mapping.py", line 80, in __getitem__
    raise KeyError(key)
KeyError: '.zarray'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/saturn/bin/pathflowai-train_model", line 8, in <module>
    sys.exit(train())
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/pathflowai/model_training.py", line 309, in train_model
    train_model_(training_opts)
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/pathflowai/model_training.py", line 37, in train_model_
    norm_dict = get_normalizer(training_opts['normalization_file'], dataset_opts)
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/pathflowai/datasets.py", line 168, in get_normalizer
    dataset = DynamicImageDataset(**dataset_opts)#nc.SafeDataset(DynamicImageDataset(**dataset_opts))
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/pathflowai/datasets.py", line 320, in __init__
    self.slides = {slide:da.from_zarr(join(input_dir,'{}.zarr'.format(slide))) for slide in IDs}
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/pathflowai/datasets.py", line 320, in <dictcomp>
    self.slides = {slide:da.from_zarr(join(input_dir,'{}.zarr'.format(slide))) for slide in IDs}
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/dask/array/core.py", line 2842, in from_zarr
    z = zarr.Array(mapper, read_only=True, path=component, **kwargs)
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/zarr/core.py", line 124, in __init__
    self._load_metadata()
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/zarr/core.py", line 141, in _load_metadata
    self._load_metadata_nosync()
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/zarr/core.py", line 152, in _load_metadata_nosync
    err_array_not_found(self._path)
  File "/srv/conda/envs/saturn/lib/python3.6/site-packages/zarr/errors.py", line 25, in err_array_not_found
    raise ValueError('array not found at path %r' % path)
ValueError: array not found at path ''

FYI, it did add the file project/train_val_test.pkl

Thanks

jlevy44 commented 4 years ago

Can you list your input directory? Are you required to run all of this through jupyter notebook?

jlevy44 commented 4 years ago

Can you also check the contents of project/train_val_test.pkl ?

asmagen commented 4 years ago

I'm running it through the terminal, not the notebook.

Here's directory contents and printing the contents of that pkl file shows it's empty I think

(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project$ ls PFAI_inputs/
Li15TDCLAMP_mask.npy  Li3NDCLAMP.png        Li57TDCLAMP_mask.npy  Li59TDCLAMP.png        Li63NDCLAMP_mask.npy  Li74TDCLAMP.png       Li97TDCLAMP_mask.npy
Li15TDCLAMP_mask.pkl  Li3NDCLAMP.zarr       Li57TDCLAMP_mask.pkl  Li59TDCLAMP.zarr       Li63NDCLAMP_mask.pkl  Li74TDCLAMP.zarr      Li97TDCLAMP_mask.pkl
Li15TDCLAMP.png       Li3T2DCLAMP_mask.npy  Li57TDCLAMP.png       Li63N2DCLAMP_mask.npy  Li63NDCLAMP.png       Li88TDCLAMP_mask.npy  Li97TDCLAMP.png
Li15TDCLAMP.zarr      Li3T2DCLAMP_mask.pkl  Li57TDCLAMP.zarr      Li63N2DCLAMP_mask.pkl  Li63NDCLAMP.zarr      Li88TDCLAMP_mask.pkl  Li97TDCLAMP.zarr
Li3NDCLAMP_mask.npy   Li3T2DCLAMP.png       Li59TDCLAMP_mask.npy  Li63N2DCLAMP.png       Li74TDCLAMP_mask.npy  Li88TDCLAMP.png
Li3NDCLAMP_mask.pkl   Li3T2DCLAMP.zarr      Li59TDCLAMP_mask.pkl  Li63N2DCLAMP.zarr      Li74TDCLAMP_mask.pkl  Li88TDCLAMP.zarr
(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project$ cat train_val_test.pkl

(nothing is printed out from the last command)

I guess the problem is mainly because of this FileNotFoundError: [Errno 2] No such file or directory: '/project/PFAI_inputs/Li63N2DCLAMP.zarr/.zarray'

@jlevy44

jlevy44 commented 4 years ago

Can you output the contents of the SQL database, as explained in the Github Wiki?

The error your getting signifies that you did not generate zarr files to begin with in the --preprocess step. What command did you use for preprocessing?

project/train_val_test.pkl should not be empty, what makes you think it is?

Also:

ls /project/PFAI_inputs/*.zarr/*.zarray

How did you develop your image masks?

Please make sure you are following the Wiki guide if there is ambiguity. I am happy to update the guide if some of the steps are not clear.

asmagen commented 4 years ago

Where is the sql explanation? I see only:

sqlite> .headers on
sqlite> select * from "256" limit 5;

train_val_test.pkl appears empty because there's no output at all in the 'cat' command

I developed the masks using qupath

This is the output I get for the folder contents:

(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project$ ls PFAI_inputs/*.zarr
PFAI_inputs/Li15TDCLAMP.zarr:
0.0.0  0.4.0   10.1.0  10.5.0  11.2.0  1.2.0   12.3.0  1.4.0  2.2.0  3.0.0  3.4.0  4.2.0  5.0.0  5.4.0  6.2.0  7.0.0  7.4.0  8.2.0  9.0.0  9.4.0
0.1.0  0.5.0   10.2.0  1.1.0   11.3.0  12.0.0  12.4.0  1.5.0  2.3.0  3.1.0  3.5.0  4.3.0  5.1.0  5.5.0  6.3.0  7.1.0  7.5.0  8.3.0  9.1.0  9.5.0
0.2.0  1.0.0   10.3.0  11.0.0  11.4.0  12.1.0  12.5.0  2.0.0  2.4.0  3.2.0  4.0.0  4.4.0  5.2.0  6.0.0  6.4.0  7.2.0  8.0.0  8.4.0  9.2.0
0.3.0  10.0.0  10.4.0  11.1.0  11.5.0  12.2.0  1.3.0   2.1.0  2.5.0  3.3.0  4.1.0  4.5.0  5.3.0  6.1.0  6.5.0  7.3.0  8.1.0  8.5.0  9.3.0

PFAI_inputs/Li3NDCLAMP.zarr:
0.0.0  0.1.0  0.2.0  0.3.0  0.4.0  1.0.0  1.1.0  1.2.0  1.3.0  1.4.0  2.0.0  2.1.0  2.2.0  2.3.0  2.4.0

PFAI_inputs/Li3T2DCLAMP.zarr:
0.0.0  0.1.0  1.0.0  1.1.0  2.0.0  2.1.0  3.0.0  3.1.0  4.0.0  4.1.0

PFAI_inputs/Li57TDCLAMP.zarr:
0.0.0  0.2.0  0.4.0  1.1.0  1.3.0  2.0.0  2.2.0  2.4.0  3.1.0  3.3.0  4.0.0  4.2.0  4.4.0  5.1.0  5.3.0  6.0.0  6.2.0  6.4.0  7.1.0  7.3.0  8.0.0  8.2.0  8.4.0
0.1.0  0.3.0  1.0.0  1.2.0  1.4.0  2.1.0  2.3.0  3.0.0  3.2.0  3.4.0  4.1.0  4.3.0  5.0.0  5.2.0  5.4.0  6.1.0  6.3.0  7.0.0  7.2.0  7.4.0  8.1.0  8.3.0

PFAI_inputs/Li59TDCLAMP.zarr:
0.0.0  0.1.0  0.2.0  1.0.0  1.1.0  1.2.0  2.0.0  2.1.0  2.2.0  3.0.0  3.1.0  3.2.0

PFAI_inputs/Li63N2DCLAMP.zarr:
0.0.0  0.3.0  1.1.0  1.4.0  2.2.0  3.0.0  3.3.0  4.1.0  4.4.0  5.2.0  6.0.0  6.3.0  7.1.0  7.4.0  8.2.0  9.0.0  9.3.0
0.1.0  0.4.0  1.2.0  2.0.0  2.3.0  3.1.0  3.4.0  4.2.0  5.0.0  5.3.0  6.1.0  6.4.0  7.2.0  8.0.0  8.3.0  9.1.0  9.4.0
0.2.0  1.0.0  1.3.0  2.1.0  2.4.0  3.2.0  4.0.0  4.3.0  5.1.0  5.4.0  6.2.0  7.0.0  7.3.0  8.1.0  8.4.0  9.2.0

PFAI_inputs/Li63NDCLAMP.zarr:
0.0.0  0.2.0  0.4.0  1.0.0  1.2.0  1.4.0  2.0.0  2.2.0  2.4.0  3.0.0  3.2.0  3.4.0  4.0.0  4.2.0  4.4.0  5.0.0  5.2.0  5.4.0
0.1.0  0.3.0  0.5.0  1.1.0  1.3.0  1.5.0  2.1.0  2.3.0  2.5.0  3.1.0  3.3.0  3.5.0  4.1.0  4.3.0  4.5.0  5.1.0  5.3.0  5.5.0

PFAI_inputs/Li74TDCLAMP.zarr:
0.0.0  0.1.0  0.2.0  1.0.0  1.1.0  1.2.0  2.0.0  2.1.0  2.2.0

PFAI_inputs/Li88TDCLAMP.zarr:
0.0.0  0.1.0  1.0.0  1.1.0

PFAI_inputs/Li97TDCLAMP.zarr:
0.0.0  0.1.0  0.2.0  1.0.0  1.1.0  1.2.0  2.0.0  2.1.0  2.2.0
(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project$ ls PFAI_inputs/*.zarr/*.zarray
ls: cannot access 'PFAI_inputs/*.zarr/*.zarray': No such file or directory
jlevy44 commented 4 years ago

There should be a .zarray file output into each of those zarr directories. You didn't copy the zarrs, did you? If so, you need to make sure to grab the .zarray s.

Try:

sqlite3 patch_information.db

train_val_test.pkl does not look right at all, though could be an artifact of what's resulting in the .zarray and .db files

asmagen commented 4 years ago

I do get non empty patch information in the sql though so what's the specific issue with the pkl and the inner objects of the zarr files? The '|7.0|0.0|0.0|0.0|0.0|0.0|0.0' lines are obviously a problem which I'll resolve by setting the background to white but the other rows show there's other patch data that should have been encoded in the outputs we were looking at.

sqlite> select * from "512" limit 5;
index|ID|x|y|patch_size|annotation|0|1|2|3|4|5|6
0|Li63N2DCLAMP|0|0|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
1|Li63N2DCLAMP|0|512|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
2|Li63N2DCLAMP|0|1024|512|0|6.99407958984375|0.0|0.0|0.0|5.7220458984375e-05|0.0|0.0
3|Li63N2DCLAMP|0|1536|512|0|6.96673583984375|0.0|0.0|0.0|0.000141143798828125|0.0|0.0
4|Li63N2DCLAMP|0|2048|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
sqlite> select * from "512" limit 200;
index|ID|x|y|patch_size|annotation|0|1|2|3|4|5|6
0|Li63N2DCLAMP|0|0|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
1|Li63N2DCLAMP|0|512|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
2|Li63N2DCLAMP|0|1024|512|0|6.99407958984375|0.0|0.0|0.0|5.7220458984375e-05|0.0|0.0
3|Li63N2DCLAMP|0|1536|512|0|6.96673583984375|0.0|0.0|0.0|0.000141143798828125|0.0|0.0
4|Li63N2DCLAMP|0|2048|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
5|Li63N2DCLAMP|0|2560|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
6|Li63N2DCLAMP|0|3072|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
7|Li63N2DCLAMP|0|3584|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
8|Li63N2DCLAMP|0|4096|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
9|Li63N2DCLAMP|512|0|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
10|Li63N2DCLAMP|512|512|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
11|Li63N2DCLAMP|512|1024|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
12|Li63N2DCLAMP|512|1536|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
13|Li63N2DCLAMP|512|2048|512|0|6.9814453125|0.0|0.0|0.0|9.5367431640625e-05|0.0|0.0
14|Li63N2DCLAMP|512|2560|512|0|6.99267578125|0.0|0.0|0.0|6.4849853515625e-05|0.0|0.0
15|Li63N2DCLAMP|512|3072|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
16|Li63N2DCLAMP|512|3584|512|0|6.77801513671875|0.0|0.0|0.0|0.00041961669921875|0.0|0.0
17|Li63N2DCLAMP|512|4096|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
18|Li63N2DCLAMP|1024|0|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
19|Li63N2DCLAMP|1024|512|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
20|Li63N2DCLAMP|1024|1024|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
21|Li63N2DCLAMP|1024|1536|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
22|Li63N2DCLAMP|1024|2048|512|0|6.980224609375|0.0|0.0|0.0|6.866455078125e-05|0.0|0.0
23|Li63N2DCLAMP|1024|2560|512|0|6.98065185546875|0.0|0.0|0.0|9.5367431640625e-05|0.0|0.0
jlevy44 commented 4 years ago

The sql looks fine to me, except for the 7s printed in the columns. I'll have to see why that had happened. The max value those columns should take is 1 anyways, so could be a bug

jlevy44 commented 4 years ago

Can you load any of those zarr files using dask array or zarr?

jlevy44 commented 4 years ago

You can also just convert the png files to npy (maybe doing the background conversion in the process; also just change the name of the extension), run preprocessing with the -nz option (no zarr) though we are still making sure this is functional for segmentation and is a new feature.

asmagen commented 4 years ago

@jlevy44 Regarding the values adding up to 7 -- it must be because of the the -tc 7 parameter representing 7 pixel segmentation classes. So it's either a bug or a wrong compilation set of parameters discussed here. How do we investigate that? How do I load the zarr files? Just zarr.open? Can't it be possibly empty as the output suggests because of the grayscale filtering threshold or the fact the image is grayscale rather than RGB rendering all the tiles being filtered in some step? And last regarding the png convention to npy, what is that for? To eliminate the need to use zarr? If that feature is very early experimental I'd rather avoid it and resolve the zarr issue, we just need a couple of hypothesis and commands to test.

asmagen commented 4 years ago

I changed the background to white and chose the threshold to be 38 based on the following:

Screen Shot 2020-06-01 at 8 06 24 PM

I opened the zarr and it does seem like the same dimensions of the original image. Not sure about the pkl file. The most significant issue is the lack. of tiles with values in the various segmentation channels:

(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project/PFAI_inputs/Li63NDCLAMP.zarr$ pathflowai-preprocess preprocess_pipeline -odb patch_information.db --preprocess --patches --basename Li63NDCLAMP --input_dir /home/jovyan/project/PFAI_inputs --patch_size 256 --intensity_threshold 38. -tc 7 -t 0.01
nonechucks may not work properly with this version of PyTorch (1.5.0). It has only been tested on PyTorch versions 1.0, 1.1, and 1.2
Data dump took 2.204174041748047
Adjust took 3.0279159545898438e-05
Valid Patches Complete
Area Info Complete
             ID  x     y  patch_size annotation    0    1    2    3    4    5    6
13  Li63NDCLAMP  0  3328         256          0  7.0  0.0  0.0  0.0  0.0  0.0  0.0
Patches took 0.26845741271972656

Any news regarding the 7 value above? @jlevy44