Large jumps in ensemble MSD

speedymcs commented 5 years ago

Hi, I have a problem getting my head around the emsd() function's output. I'm analyzing a timelapse of ~900 frames with roughly 500 particles. In order to assess the directedness of the particles' movement, I plot the square root of the emsd() output.

The way I understand it, emds() calculates for each track the squared distance travelled from the starting point to the end point (or to the point at max_lagtime) and divides the sum by the number of tracks.

At first I filtered out all tracks with a length below 200 frames: tf = tp.filtering.filter_stubs(t,200)

Removed any drift:

tf.index.names = ["index"] # Resolves name clash caused with trackpy 0.4.1
d = tp.compute_drift(tf)
td = tp.subtract_drift(tf.copy(), d)

Then I calculated the ensemble MSD: all_msd = tp.motion.emsd(td,px,1/timestep,max_lagtime=td.frame.max(),detail=True) I chose td.frame.max() in order to see the average displacement across the whole timelapse.

Plotting the square root gives this result: plt.plot(np.sqrt(all_msd.msd))

EMSD_question

Now this curve is very smooth up to frame number 200, which is the minimum track length in the data frame. After that it gets rough quickly. Why could this be? Is emsd() only considering tracks that start at frame 0, and after frame 200, more and more tracks end, so the mean value of all MSDs gets more shaky? Then, what I can't understand at all are those large steps in the second half of the curve. Possibly there are only a couple of tracks left that are being considered, and once one of them ends, the average value changes dramatically? How can I investigate this? If the above is true, should the max_lagtime setting basically always be chosen to be the minimum frame length of the filtered tracks?

rbnvrw commented 5 years ago

Thank you for your bug report @speedymcs.

In order to assess the directedness of the particles' movement, I plot the square root of the emsd() output.

I'm not really sure what you mean by this. The mean squared displacement will give you an idea of how 'spread out in space' the particles will be at a certain time. It is expected to grow linearly in time, and the proportionality constant is the number of dimensions x the diffusion coefficient. I don't think you'll be able to obtain any information about the direction of the particles in this way. See also this wikipedia article for more information.

To summarize, I would only look at the mean squared displacement of your particles (not the square root of it). People usually also plot it on a log-log scale, because in that way, you can easily see if the MSD is linear (slope=1) and differences in diffusion coefficient (manifested in a different y-intercept).

Now this curve is very smooth up to frame number 200, which is the minimum track length in the data frame. After that it gets rough quickly. Why could this be?

This is actually something that is very common and expected. It has to do with statistics: for a lag time of 1 frame, you'll have 899 pairs of frames in your movie from which to calculate the MSD. However, for the other extreme case, for a lag time of 899, you'll have only one pair which you can use to calculate the MSD. The uncertainty at this lag time will therefore be much higher and you'll see more experimental noise in that part of the curve. This is further complicated by the fact that by using this 'moving window' approach to calculate the MSD, not all measurements are independent: in a hand-waving argument, for a lagtime of 2 frames, you use the offset between frames 0 & 2 for example, but also between 1 & 3 and therefore the motion between frames 1 & 2 is 'counted twice' in this instance.

To summarize, a good rule of thumb is to only consider lagtimes up to 1/3 of the total duration of the measurement, in your case up to roughly 300 frames. Indeed, in that part your graph still looks fairly smooth.

I hope this will help you and don't hesitate to ask further questions.

speedymcs commented 5 years ago

Thank you very much for taking the time to answer @rbnvrw , it's very helpful! If I now understand correctly, the emsd() function averages displacement over both time and particle number. The reference point for computing the actual displacement can be at any point in time across the timelapse. I assumed it was only averaging over all particles and that each x-value (lag time) of the resulting plot correlates to a specific frame in the timelapse, with the reference point for all particles being their position in the first frame.

For some background, I am currently using Trackpy for tracking biological cells (works great!). They move in a self-propelled way. I aim to use the MSD as a way of classifying the "randomness" of their motion. For example, if they sense an attractive chemical, they will choose to roughly walk into that direction; if they do not sense it, they still move, but more like a 2D random walker where the displacement from the starting point should grow with the square root of the time.

I will close this issue since there seems to be no bug. I feel like I've misused the issue tracker for a personal problem of understanding--was something like a Trackpy message board ever considered? :)

rbnvrw commented 5 years ago

@speedymcs no problem, I'm glad I could help! For active / self-propelled particles (in the short time limit) you could also try to fit the mean squared displacement with a quadratic term to distinguish between Brownian and active motion (see for example here: "Smoluchowski Diffusion Equation for Active Brownian Swimmers", Francisco J. Sevilla and Mario Sandoval (https://pdfs.semanticscholar.org/065b/41f778864af9f1e932d30990de673da15a77.pdf)

I don't think there is the need for a message board right now, but in general you can use the issue tracker to ask questions, especially if you add the 'question' label. Good luck on your project!

soft-matter / trackpy

Large jumps in ensemble MSD #572