4D and 5D functionality

JoshuaSkootsky commented 10 years ago

Hello all!

I've seen very encouraging internal documentation, confirmed by the great and very helpful Dan Allen (who suggested that I open this as an "issue"), that trackpy can in principle handle n-dimensional data.

In practice, how to best get this working is, for now, somewhat unclear to me.

In principle, getting the information in as a numpy array of the proper dimensionality will work.

However, I am also interested in the batch processing available through pims and trackpy, since experimental data sets can extend into the GB and TB range.

Here is sample data from the Megason lab (in the MB range), which is 4D x,y,z, and t, if you only look at Channel 00, which has blog-like nuclei in three time points. I would like to track 3D nuclei in xyz space through time:

http://www.gofigure2.org/index.php/users.html ("GoFigureSampleData.zip")

5D functionality - this is the real reason I am interested in trackpy. Imagine that I have nuclei, which are being tracked in 4D, in one channel, and in another, I have some periodic fluorescent marker. I would like to use the (constant) nuclei tracks to connect the on/off states of the periodic marker, so that I could track and do data analysis in those periodic cycles.

How could this be done?

tacaswell commented 10 years ago

The jargon on dimensionality in tracking typically does not include the independent variable (time typically) so tracking the location of the nuclei in the constant channel is 3D tracking.

If you can track the constant channel through time, then you know where the nuclei is, just use than information to extract the state of the second channel.

If there are localization issues I would just run a second round of tracking between the channels at a fixed time point which will associate a blob in the blinking channel with the blobs in the constant channel (which you have previously linked through time).

nkeim commented 9 years ago

Hi Joshua,

As Tom said, you need to start by doing tracking of 3D images in your constant channel. PIMS does not readily support 3D image data of this kind, so you will need to write a function to take all the z-slices for the frame frame_number, and assemble them into a 3D numpy array. Note that trackpy prefers relatively spherical blobs and relatively isotropic particle displacements. So if your z slice spacing is very different from the xy pixel size, you'll need to do some resampling.

Once you've written this get_3d_image(frame_number) function, you should be able to invoke trackpy.batch() as

trackpy.batch( (get_3d_image(fnum) for fnum in list_of_all_frame_numbers), ...)

where you supply the rest of the parameters, as described in the example IPython notebooks. Those notebooks also show you how to specify an output file to batch(), so that only one frame at a time will ever be in RAM.

The output from batch() should have x, y, z columns (plus some extra info). Just tell trackpy.link_df_iter() to use those 3 columns, and voilà!

That's the basic recipe. I suggest you try some of the 2D tracking examples, if you haven't already, just so you understand how the pieces of trackpy work together. Then let us know if you run into any specific problems when you try to take it to 3D.

As Tom said, once you can identify and track the nuclei in your constant-fluorescence channel, it sounds like incorporating information from the second channel will be relatively easy.

danielballan commented 9 years ago

I will add that PIMS should be readily extensible to 3D as well. I have no plans to write tests for this. We only officially support our built-in classes. But since PIMS documentation is scant, here is a scrap of code to start from. @tacaswell this belongs is PIMS docs whenever you or I get around to writing some.

from pims.base_frames import FramesSequence
from pims.frame import Frame

class Foo(FramesSequence):

    def get_frame(self, frame_no):
        images = ...  # load all images corresponding to one 3D frame
        a = np.dstack(images))  # combine images into one ndarray
        return Frame(a, frame_no)

    def __len__(self):
        # figure out how many frames (time steps) there are
        # We require this so we can do fancy indexing, e.g. frames[-1]
        return frame_count

The rest of the PIMS machinery is inherited through base frames. If you run into trouble, post a specific example as a PIMS issue showing us what you've tried. N.B. If you don't define those two functions with those specific names, PIMS will reject your subclass out of hand.

JoshuaSkootsky commented 9 years ago

Cool! This should be fun. I'll give it a spin.

JoshuaSkootsky commented 9 years ago

This appears to be working:

frames = tp.ImageSequence('Somitech00.png') """"210 frames in file = 70 * 3"""" newframes = np.reshape(frames, (3,70,1024,1024))

..

I have rather naively called trackpy.locate(newframes[0],33)

How will trackpy handle that the z-direction has a much lower resolution than the x-y values? Can I view z-x or x-y stacks, and tell trackpy how many pixels I expect my blobby nuclei to extend in the z-direction? (I find the .locate and .annotate methods particularly convenient)

tacaswell commented 9 years ago

Currently, the locate function knows nothing about the actual dimensions and assumes that the pixels are isotropic. One solution is to either down-sample you x-y or upsample you z data.

Another solution (which I learned from Peter Lu) is to do locate on each plane independently. You can then use linking to connect the x-y locations into tracks as a function of z. You can the re-processes those tracks to each represent a single nuclei. If your features are dense you might have to split tracks by looking at. The output from that stage then gets fed into linking again as a function of time.

nkeim commented 9 years ago

You might want to check out scipy.ndimage.interpolation.zoom.

JoshuaSkootsky commented 9 years ago

tacaswell: The Peter Lu solution sounds like the method described here: http://www.biomedcentral.com/1471-2105/11/580

Could you tell me more about how to implement it in trackpy?

danielballan commented 9 years ago

Whichever approach you try, all you'll need from trackpy is locate and link_df (orlink_df_iter if you need to get fancy with memory management). The rest numpy and and scipy.ndimage. Poke around their docs, try some stuff, and if you are still stuck, come back with a code snippet showing what you've tried.

JoshuaSkootsky commented 9 years ago

I tried just connecting frames together, on the assumption that between frames 69 and 70 there would not be so many non-spurious connections. Not sure how to refine that, or if it would be better to just ignore it.

frames = tp.ImageSequence('Somite*ch00*.png')

f = tp.batch(frames[:], 33)
tracks = tp.link_df(f,6,memory=0)
tracks3 = tp.filter_stubs(tracks,3)
d = tp.compute_drift(tracks3)
tm = tp.subtract_drift(tracks3,d)

tm['particle'].nunique()
Out[82]: 2504

(I think there are ~1,000 unique particles in the three time frames, based on previous counts of 3D segmentation so there is room to improve on this recognition. On the other hand, the trackpy filters are able to pick up on very faint cells, so this might be an improvement over previous cell counts, more than a large number of false positives.)

I am seeing the "tracks" as the assignment of the same 'particle' number in the Dataframes tm or tracks3, across different 'frames'

So, how to track the tracks?

I tried this, but the usage is wrong:

tp.link_df(tm,5,memory=0,neighbor_strategy='KDTree', link_strategy='auto',
            predictor=None, hash_size=None, box_size=None,
            pos_columns=['x','y'], t_column='particle', verify_integrity=True,
            retain_index=False)

tacaswell commented 9 years ago

@JoshuaSkootsky If you put triple back ticks around code (that is ' ``` ' ) it will auto-format it which makes it much more read able.

tacaswell commented 9 years ago

You need to reduce your 'tracks' (which are if I am following correctly) (x(z), y(z)) -> (x, y, z). The typical way is to do a weighted average based on the intensity in each plane:

Z = (\sum x_z I_z)  / (\sum I_z)

danielballan commented 9 years ago

Good start. Quick rundown:

Locate 2D centroids in each plane.
Link them together. (It seems like this is what you have done here. So far, so good. Incidentally, do you need drift subtraction within a frame?) At this point, what trackpy calls 'frame' is really the z position of each plane the particle appears in.
For each particle (i.e. tm.groupby('particle')...) find the average x, y and frame. Now rename the 'frame' column 'z'.
At this point tm should column columns x, y, z, and appearance columns. No frame. No particle number.
Create a 'frame' column. If this is 3D frame number 0, assign it to 0.
Do this for each 3D frame.
Now tp.link_df(pd.concat(list_of_tms_for_each_frame), pos_columns=list('xyz')). The t_column should just be 'frame', as it is by default.

JoshuaSkootsky commented 9 years ago

tm.columns
Out[119]: Index([u'ecc', u'ep', u'frame', u'mass', u'particle', u'signal', u'size', u'x', u'y'], dtype='object')

tm.columns = [u'ecc', u'ep', u'z', u'mass', u'particle', u'signal', u'size', u'x', u'y']

tm.head()
Out[121]: 
            ecc         ep  z   mass  particle    signal       size  frame                                                                 
0      0.084865 -27.495367  0  48077         0  -0.04602   8.720155   
0      0.206151 -24.730022  0  42301         1  -0.04602   9.205166   
0      0.175589   0.063768  0  39726         2  15.95398   9.755142   
0      0.111108   0.083121  0  37318         3  10.95398  10.343279   
0      0.244102   0.013247  0  31554         4  87.95398   9.095007   

                x           y  
frame                          
0      697.195603  489.748487  
0      852.196473  349.731638  
0      193.404395  459.973972  
0      551.574066  355.763921  
0      625.695158  390.232142

Why is there still a persistent "frame" label?

JoshuaSkootsky commented 9 years ago

This is helping me look at the sizes of the tracks coming out into the z-axis:

 grouped = tracks.reset_index(drop=True).groupby('particle')
 f = grouped.size() #f is a pandas series
 f.hist()

danielballan commented 9 years ago

To answer your question, which I just noticed, it looks like the index is named 'frame', probably because of a groupby operation. The name should not matter for computation, only readability. Maybe dropping the index, as you do in your second comment, resolves this.

danielballan commented 9 years ago

Currently, the locate function knows nothing about the actual dimensions and assumes that the pixels are isotropic.

Since this is no longer true as of #162, I am closing this, but happy to continue discussion. Also it may be useful to review this discussion in upcoming 3D/ND documentation. @caspervdw

soft-matter / trackpy

4D and 5D functionality #131