Closed asmagen closed 4 years ago
Can you list your input directory? Are you required to run all of this through jupyter notebook?
Can you also check the contents of project/train_val_test.pkl ?
I'm running it through the terminal, not the notebook.
Here's directory contents and printing the contents of that pkl file shows it's empty I think
(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project$ ls PFAI_inputs/
Li15TDCLAMP_mask.npy Li3NDCLAMP.png Li57TDCLAMP_mask.npy Li59TDCLAMP.png Li63NDCLAMP_mask.npy Li74TDCLAMP.png Li97TDCLAMP_mask.npy
Li15TDCLAMP_mask.pkl Li3NDCLAMP.zarr Li57TDCLAMP_mask.pkl Li59TDCLAMP.zarr Li63NDCLAMP_mask.pkl Li74TDCLAMP.zarr Li97TDCLAMP_mask.pkl
Li15TDCLAMP.png Li3T2DCLAMP_mask.npy Li57TDCLAMP.png Li63N2DCLAMP_mask.npy Li63NDCLAMP.png Li88TDCLAMP_mask.npy Li97TDCLAMP.png
Li15TDCLAMP.zarr Li3T2DCLAMP_mask.pkl Li57TDCLAMP.zarr Li63N2DCLAMP_mask.pkl Li63NDCLAMP.zarr Li88TDCLAMP_mask.pkl Li97TDCLAMP.zarr
Li3NDCLAMP_mask.npy Li3T2DCLAMP.png Li59TDCLAMP_mask.npy Li63N2DCLAMP.png Li74TDCLAMP_mask.npy Li88TDCLAMP.png
Li3NDCLAMP_mask.pkl Li3T2DCLAMP.zarr Li59TDCLAMP_mask.pkl Li63N2DCLAMP.zarr Li74TDCLAMP_mask.pkl Li88TDCLAMP.zarr
(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project$ cat train_val_test.pkl
(nothing is printed out from the last command)
I guess the problem is mainly because of this
FileNotFoundError: [Errno 2] No such file or directory: '/project/PFAI_inputs/Li63N2DCLAMP.zarr/.zarray'
@jlevy44
Can you output the contents of the SQL database, as explained in the Github Wiki?
The error your getting signifies that you did not generate zarr files to begin with in the --preprocess step. What command did you use for preprocessing?
project/train_val_test.pkl should not be empty, what makes you think it is?
Also:
ls /project/PFAI_inputs/*.zarr/*.zarray
How did you develop your image masks?
Please make sure you are following the Wiki guide if there is ambiguity. I am happy to update the guide if some of the steps are not clear.
Where is the sql explanation? I see only:
sqlite> .headers on
sqlite> select * from "256" limit 5;
train_val_test.pkl appears empty because there's no output at all in the 'cat' command
I developed the masks using qupath
This is the output I get for the folder contents:
(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project$ ls PFAI_inputs/*.zarr
PFAI_inputs/Li15TDCLAMP.zarr:
0.0.0 0.4.0 10.1.0 10.5.0 11.2.0 1.2.0 12.3.0 1.4.0 2.2.0 3.0.0 3.4.0 4.2.0 5.0.0 5.4.0 6.2.0 7.0.0 7.4.0 8.2.0 9.0.0 9.4.0
0.1.0 0.5.0 10.2.0 1.1.0 11.3.0 12.0.0 12.4.0 1.5.0 2.3.0 3.1.0 3.5.0 4.3.0 5.1.0 5.5.0 6.3.0 7.1.0 7.5.0 8.3.0 9.1.0 9.5.0
0.2.0 1.0.0 10.3.0 11.0.0 11.4.0 12.1.0 12.5.0 2.0.0 2.4.0 3.2.0 4.0.0 4.4.0 5.2.0 6.0.0 6.4.0 7.2.0 8.0.0 8.4.0 9.2.0
0.3.0 10.0.0 10.4.0 11.1.0 11.5.0 12.2.0 1.3.0 2.1.0 2.5.0 3.3.0 4.1.0 4.5.0 5.3.0 6.1.0 6.5.0 7.3.0 8.1.0 8.5.0 9.3.0
PFAI_inputs/Li3NDCLAMP.zarr:
0.0.0 0.1.0 0.2.0 0.3.0 0.4.0 1.0.0 1.1.0 1.2.0 1.3.0 1.4.0 2.0.0 2.1.0 2.2.0 2.3.0 2.4.0
PFAI_inputs/Li3T2DCLAMP.zarr:
0.0.0 0.1.0 1.0.0 1.1.0 2.0.0 2.1.0 3.0.0 3.1.0 4.0.0 4.1.0
PFAI_inputs/Li57TDCLAMP.zarr:
0.0.0 0.2.0 0.4.0 1.1.0 1.3.0 2.0.0 2.2.0 2.4.0 3.1.0 3.3.0 4.0.0 4.2.0 4.4.0 5.1.0 5.3.0 6.0.0 6.2.0 6.4.0 7.1.0 7.3.0 8.0.0 8.2.0 8.4.0
0.1.0 0.3.0 1.0.0 1.2.0 1.4.0 2.1.0 2.3.0 3.0.0 3.2.0 3.4.0 4.1.0 4.3.0 5.0.0 5.2.0 5.4.0 6.1.0 6.3.0 7.0.0 7.2.0 7.4.0 8.1.0 8.3.0
PFAI_inputs/Li59TDCLAMP.zarr:
0.0.0 0.1.0 0.2.0 1.0.0 1.1.0 1.2.0 2.0.0 2.1.0 2.2.0 3.0.0 3.1.0 3.2.0
PFAI_inputs/Li63N2DCLAMP.zarr:
0.0.0 0.3.0 1.1.0 1.4.0 2.2.0 3.0.0 3.3.0 4.1.0 4.4.0 5.2.0 6.0.0 6.3.0 7.1.0 7.4.0 8.2.0 9.0.0 9.3.0
0.1.0 0.4.0 1.2.0 2.0.0 2.3.0 3.1.0 3.4.0 4.2.0 5.0.0 5.3.0 6.1.0 6.4.0 7.2.0 8.0.0 8.3.0 9.1.0 9.4.0
0.2.0 1.0.0 1.3.0 2.1.0 2.4.0 3.2.0 4.0.0 4.3.0 5.1.0 5.4.0 6.2.0 7.0.0 7.3.0 8.1.0 8.4.0 9.2.0
PFAI_inputs/Li63NDCLAMP.zarr:
0.0.0 0.2.0 0.4.0 1.0.0 1.2.0 1.4.0 2.0.0 2.2.0 2.4.0 3.0.0 3.2.0 3.4.0 4.0.0 4.2.0 4.4.0 5.0.0 5.2.0 5.4.0
0.1.0 0.3.0 0.5.0 1.1.0 1.3.0 1.5.0 2.1.0 2.3.0 2.5.0 3.1.0 3.3.0 3.5.0 4.1.0 4.3.0 4.5.0 5.1.0 5.3.0 5.5.0
PFAI_inputs/Li74TDCLAMP.zarr:
0.0.0 0.1.0 0.2.0 1.0.0 1.1.0 1.2.0 2.0.0 2.1.0 2.2.0
PFAI_inputs/Li88TDCLAMP.zarr:
0.0.0 0.1.0 1.0.0 1.1.0
PFAI_inputs/Li97TDCLAMP.zarr:
0.0.0 0.1.0 0.2.0 1.0.0 1.1.0 1.2.0 2.0.0 2.1.0 2.2.0
(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project$ ls PFAI_inputs/*.zarr/*.zarray
ls: cannot access 'PFAI_inputs/*.zarr/*.zarray': No such file or directory
There should be a .zarray file output into each of those zarr directories. You didn't copy the zarrs, did you? If so, you need to make sure to grab the .zarray s.
Try:
sqlite3 patch_information.db
train_val_test.pkl does not look right at all, though could be an artifact of what's resulting in the .zarray and .db files
I do get non empty patch information in the sql though so what's the specific issue with the pkl and the inner objects of the zarr files? The '|7.0|0.0|0.0|0.0|0.0|0.0|0.0' lines are obviously a problem which I'll resolve by setting the background to white but the other rows show there's other patch data that should have been encoded in the outputs we were looking at.
sqlite> select * from "512" limit 5;
index|ID|x|y|patch_size|annotation|0|1|2|3|4|5|6
0|Li63N2DCLAMP|0|0|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
1|Li63N2DCLAMP|0|512|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
2|Li63N2DCLAMP|0|1024|512|0|6.99407958984375|0.0|0.0|0.0|5.7220458984375e-05|0.0|0.0
3|Li63N2DCLAMP|0|1536|512|0|6.96673583984375|0.0|0.0|0.0|0.000141143798828125|0.0|0.0
4|Li63N2DCLAMP|0|2048|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
sqlite> select * from "512" limit 200;
index|ID|x|y|patch_size|annotation|0|1|2|3|4|5|6
0|Li63N2DCLAMP|0|0|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
1|Li63N2DCLAMP|0|512|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
2|Li63N2DCLAMP|0|1024|512|0|6.99407958984375|0.0|0.0|0.0|5.7220458984375e-05|0.0|0.0
3|Li63N2DCLAMP|0|1536|512|0|6.96673583984375|0.0|0.0|0.0|0.000141143798828125|0.0|0.0
4|Li63N2DCLAMP|0|2048|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
5|Li63N2DCLAMP|0|2560|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
6|Li63N2DCLAMP|0|3072|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
7|Li63N2DCLAMP|0|3584|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
8|Li63N2DCLAMP|0|4096|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
9|Li63N2DCLAMP|512|0|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
10|Li63N2DCLAMP|512|512|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
11|Li63N2DCLAMP|512|1024|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
12|Li63N2DCLAMP|512|1536|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
13|Li63N2DCLAMP|512|2048|512|0|6.9814453125|0.0|0.0|0.0|9.5367431640625e-05|0.0|0.0
14|Li63N2DCLAMP|512|2560|512|0|6.99267578125|0.0|0.0|0.0|6.4849853515625e-05|0.0|0.0
15|Li63N2DCLAMP|512|3072|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
16|Li63N2DCLAMP|512|3584|512|0|6.77801513671875|0.0|0.0|0.0|0.00041961669921875|0.0|0.0
17|Li63N2DCLAMP|512|4096|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
18|Li63N2DCLAMP|1024|0|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
19|Li63N2DCLAMP|1024|512|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
20|Li63N2DCLAMP|1024|1024|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
21|Li63N2DCLAMP|1024|1536|512|0|7.0|0.0|0.0|0.0|0.0|0.0|0.0
22|Li63N2DCLAMP|1024|2048|512|0|6.980224609375|0.0|0.0|0.0|6.866455078125e-05|0.0|0.0
23|Li63N2DCLAMP|1024|2560|512|0|6.98065185546875|0.0|0.0|0.0|9.5367431640625e-05|0.0|0.0
The sql looks fine to me, except for the 7s printed in the columns. I'll have to see why that had happened. The max value those columns should take is 1 anyways, so could be a bug
Can you load any of those zarr files using dask array or zarr?
You can also just convert the png files to npy (maybe doing the background conversion in the process; also just change the name of the extension), run preprocessing with the -nz option (no zarr) though we are still making sure this is functional for segmentation and is a new feature.
@jlevy44 Regarding the values adding up to 7 -- it must be because of the the -tc 7 parameter representing 7 pixel segmentation classes. So it's either a bug or a wrong compilation set of parameters discussed here. How do we investigate that? How do I load the zarr files? Just zarr.open? Can't it be possibly empty as the output suggests because of the grayscale filtering threshold or the fact the image is grayscale rather than RGB rendering all the tiles being filtered in some step? And last regarding the png convention to npy, what is that for? To eliminate the need to use zarr? If that feature is very early experimental I'd rather avoid it and resolve the zarr issue, we just need a couple of hypothesis and commands to test.
I changed the background to white and chose the threshold to be 38 based on the following:
I opened the zarr and it does seem like the same dimensions of the original image. Not sure about the pkl file. The most significant issue is the lack. of tiles with values in the various segmentation channels:
(saturn) jovyan@jupyter-assafmagen-2dpathflowai:~/project/PFAI_inputs/Li63NDCLAMP.zarr$ pathflowai-preprocess preprocess_pipeline -odb patch_information.db --preprocess --patches --basename Li63NDCLAMP --input_dir /home/jovyan/project/PFAI_inputs --patch_size 256 --intensity_threshold 38. -tc 7 -t 0.01
nonechucks may not work properly with this version of PyTorch (1.5.0). It has only been tested on PyTorch versions 1.0, 1.1, and 1.2
Data dump took 2.204174041748047
Adjust took 3.0279159545898438e-05
Valid Patches Complete
Area Info Complete
ID x y patch_size annotation 0 1 2 3 4 5 6
13 Li63NDCLAMP 0 3328 256 0 7.0 0.0 0.0 0.0 0.0 0.0 0.0
Patches took 0.26845741271972656
Any news regarding the 7 value above? @jlevy44
After successfully running the processing, assuming your response to the other thread is that it's alright, I got into the following problem with running the training:
So based on the following discussion I ran
export MKL_SERVICE_FORCE_INTEL=1
and the following but got an error:FYI, it did add the file project/train_val_test.pkl
Thanks