janelia-flyem / gala

Automatic segmentation of electron microscopy volumes
BSD 3-Clause "New" or "Revised" License
76 stars 29 forks source link

agglo.py runtime warning and and unmerged output #81

Closed thanujadax closed 8 years ago

thanujadax commented 8 years ago

Hi From my tif dataset I created hdf5 files with an identical structure to the ones used in example.py and ran the following code (Appendix 1) similar to example.py with the addition of converting the output into png files. I get the following warning:

/miniconda2/envs/gala/lib/python3.5/site-packages/gala/agglo.py:265: RuntimeWarning: invalid value encountered in double_scalars
  return p3*log2(p3) - p1*log2(p1) - p2*log2(p2) - \

Also, the final segmentation seems to be the same as the initial superpixel map: 000

The probability map for this image is: 00_probmap

with superpixel map obtained by Watershed transform (labeled with integers): 00_ws

What do you think is the problem?

Thanks in advance!


Appendix 1 (Code)

from gala import imio, classify, features, agglo, evaluate as ev
import numpy as np
import os
from PIL import Image
'''
Inputs
'''
h5File_train_gt = 'train_gt.h5'
h5File_train_ws = 'train_ws.h5'
h5File_train_probMap = 'train_probMaps.h5'

h5File_test_ws = 'test_ws.h5'
h5File_test_probMap = 'test_probMaps.h5'

'''
Outputs
'''
outputRoot = 'outputs'

# read in training data
# groundtruth volume, probability maps, superpixe/watershed map
gt_train, pr_train, ws_train = (map(imio.read_h5_stack,
                                [h5File_train_gt, h5File_train_probMap,
                                 h5File_train_ws]))

# create a feature manager
fm = features.moments.Manager()
fh = features.histogram.Manager()
fc = features.base.Composite(children=[fm, fh])

# create Region Adjacency Graph (RAG) and obtain a training dataset
g_train = agglo.Rag(ws_train, pr_train, feature_manager=fc)
(X, y, w, merges) = g_train.learn_agglomerate(gt_train, fc)[0]
y = y[:, 0] # gala has 3 truth labeling schemes, pick the first one ????
print((X.shape, y.shape)) # standard scikit-learn input format

# train a classifier, scikit-learn syntax
rf = classify.DefaultRandomForest().fit(X, y)
# a policy is the composition of a feature map and a classifier
# policy = merge priority function
learned_policy = agglo.classifier_probability(fc, rf)

# get the test data and make a RAG with the trained policy
pr_test, ws_test = (map(imio.read_h5_stack,
                        [h5File_test_probMap, h5File_test_ws]))
g_test = agglo.Rag(ws_test, pr_test, learned_policy, feature_manager=fc)
g_test.agglomerate(0.5) # best expected segmentation obtained with a threshold of 0.5
seg_test1 = g_test.get_segmentation()

# convert hdf into png and save 
np_data = np.array(seg_test1)
sizeZ,sizeY,sizeX = np_data.shape
for i in range(0,sizeZ):
    im1 = np_data[i,:,:]
    im = Image.fromarray(im1.astype('uint8'))
    imFileName = str(i).zfill(3) + ".png"
    imFileName = os.path.join(outputRoot,imFileName)
    im.save(imFileName)
jni commented 8 years ago

Hey @thanujadax!

We actually dropped support for having a 0 label separating supervoxels/segments in 0.3: https://gala.readthedocs.io/en/latest/release-notes.html

So you need to make supervoxels that don't have a boundary between them, ie their pixels actually touch. The scikit-image skimage.morphology.watershed function can do this.

Also, for future reference, check out gala.features.default for some default feature managers that perform better than the ones in the example.

I'm closing this for now but feel free to reopen if your problem isn't fixed!

thanujadax commented 8 years ago

Hi Juan, I produced watershed super-pixels without the 0 labeled separation. The hdf5 files used for training is:

train.zip

However, there seems to be something wrong as seen by the output for test data, for instance: 008

The hdf5 files used for testing are: test.zip

What do you think has gone wrong?

Thanks in advance! Thanuja

jni commented 8 years ago

A priori it seems to me that your fragments (my preferred term for watersheds/supervoxels these days) are completely random. =P Can you show me an example each of:

?

The fragments need to be kinda similar between training and testing volumes!

thanujadax commented 8 years ago

Training prob map: train_probmap_00

Training fragments: train_ws_00

Training ground truth: train_gt_00


Testing prob map: test_probmaps_00

Testing fragments: test_ws_00

What do you think is wrong? Thanks for looking at it!

Thanuja

thanujadax commented 8 years ago

I have to see if there was something wrong in the order of the image files included in the hdf5 files. Looks to me the ground truth doesn't correspond to the probability map! I'll check it and re-run it with the right order, and get back to you soon. Thanks

jni commented 8 years ago

Ah! Ok let me know! btw for visualising segmentations, try:

from gala import viz
viz.imshow_rand(fragments)

They are way more visible that way. =)

thanujadax commented 8 years ago

Hi Juan, It seems like the problems might be with the watershed fragments. Here is what the fragments look like for the Training prob map shown above: 00

I generated the fragments using the following code using skimage.morphology.watershed as suggested by you:

import numpy as np
from skimage.morphology import watershed
from skimage.feature import peak_local_max
from scipy import ndimage
from PIL import Image

image = np.array(Image.open(inputFileName).convert('L'), 'f')
image = (image - 1) * (-1)

distance = ndimage.distance_transform_edt(image)
local_maxi = peak_local_max(distance, indices=False, footprint=np.ones((3, 3)), labels=image)
markers = morphology.label(local_maxi)
labels_ws = watershed(-distance, markers, mask=image)

Would be great if you could tell me how you generate the fragments. Thanks! Thanuja

jni commented 8 years ago

Hi @thanujadax, sorry for the delay, I've been at a conference.

I think the way you compute the boundary image is a bit weird, and you might have ended up with some negative values there... What's the min and max of image?

Then, the distance transform expects a bool image with 0 on the boundaries and 1 everywhere else... I don't know what your call of distance transform on a float image does.

To help with this, you could plot the image, the distance transformed image, and the peaks. See this code for how to show the peaks:

peaks = feature.peak_local_max(distance_from_edge, min_distance=10)
plt.imshow(edges, cmap='gray')  # show your image here
plt.plot(peaks[:, 1], peaks[:, 0], 'ro');  # this plots markers at each point

I usually use the distance transform only for the markers, not for the boundary map for the watershed... I'm not sure what the problem is but your image definitely looks problematic! =)

thanujadax commented 8 years ago

Hi @jni Thanks for your inputs. I made a few corrections to the process. When you say boundary map/image, did you refer to the membrane probability map (=image)? I assumed yes for now. image has a max value of 255 at neuron membranes and a min value of 0 at cell interiors. image

This time I used a binary thresholded version of the above image as an input to ndi.distance_transform_edt. It has 1 for membrane pixels and 0 for non-membrane pixels as suggested by you - if I understood correctly. image_bt

Then the distance transformed image is: dist

.. and the peaks generated and plotted using the code you provided: peaks

Since skimage.morphology.watershed requires the peaks to be an array with same dimensions as the input image, I regenerated the peaks with indices=False as follows:

local_maxi = peak_local_max(distance, indices=False, min_distance=10)

Instead of the distance image, this time I generated the watershed fragments using the original probability image

labels_ws = watershed(image, markers, mask=image)

which looks like: ws

I trained gala on two such images with ground truth. Then I tried the region merging (using the same code as mentioned in my first post above) on the same image I trained on (shown above) to get the following output: 000

Can you please let me know what you think about this output? Am I still doing something wrong?

Thanks! Thanuja

jni commented 8 years ago

Hey @thanujadax!

It has 1 for membrane pixels and 0 for non-membrane pixels as suggested by you - if I understood correctly.

That's exactly the opposite of what I said! =P You want seed points to be away from the boundary!