Errorbars on ensemble MSD plot

vivarose commented 8 years ago

I would like to see errorbars on the ensemble mean square displacement plot. There are a number of subtleties that could be taken into account. (For example, the coordinates that Trackpy assigns to each particle will have some error. For another example, the single particle MSD error must be calculated with the awareness that the data being averaged are not independent.)

The easiest way to do it for an ensemble MSD is to calculate the standard deviation when taking the average of the single-particle trajectories.

I think that calculating and plotting errorbars should be built in, if not the default setting.

caspervdw commented 8 years ago

Thanks for you request! I think that a simple error analysis can be based on the effective number of independent measurements, which is already calculated by msd. However, if you start to include tracking uncertainties, particle correlations etc., the error estimation becomes more difficult and very much user-specific.

I am not in favor of including those extra effects. A simple estimation would be fine, if we are clear about the simplified error model we use.

I do not have the time to look into this, but I can help you if you want to contribute to trackpy yourself!

vivarose commented 8 years ago

I will definitely think about contributing to trackpy because I do think this would be helpful for me and for the community.

nkeim commented 8 years ago

:smile: This would be a welcome addition! I agree that it should not be turned on by default, and it should get more than a few words in the documentation. Otherwise the possibility for a non-expert (i.e., me) to misinterpret the result could be too great.

vivarose commented 8 years ago

My plan is to edit the emsd function so that it returns a standard deviation. This standard deviation shows how the single particle MSDs vary. Very simple, no subtleties. Two initial questions (which can be changed later anyway):

Should emsd() always return the standard deviation? or only if detail=True?
What should the standard deviation be called? stdev, stddev, sigma, something else?

tacaswell commented 8 years ago

I am strongly :-1: on kwargs changing the number/type of returned values. If you need to break the API on msd, then either just break it and document it or (and I know @danielballan will disagree about this) make a new function (msd_err?, I am bad at naming things).

I would go with the numpy naming of std.

danielballan commented 8 years ago

Welcome, @vivarose!

@tacaswell A reminder here that we already have this detail=True flag that @vivarose mentioned in (1). If I was doing emd over again, I would implement emd_detailed as a separate function. But I think (1) is probably the right way to go here with respect to consistency with the existing code. If we want to rethink that API, we should do it for all the detailed columns, in a separate pull request.

I agree: I like std for consistency with numpy -- notwithstanding its unfortunate name collision with common English usage.

vivarose commented 8 years ago

Hi,

Thank you for the encouragement! I hope this will be useful for the community. I find that MSD plots look deceptively smooth because the datapoints are correlated with each other, so having the ability to plot errorbars will certainly be useful to me.

I added a few lines to the function tp.motion.emsd() so that it now calculates and returns the standard deviation. It's available here: https://github.com/vivarose/trackpy

Please let me know if I should change anything or edit any of the documentation.

Here are the lines I added to the code:

    # Calculation of biased weighted standard deviation
    numerator = ((msds.subtract(results))**2).mul(msds['N'], axis=0).sum(level=1)
    denominator = msds['N'].sum(level=1) # without Bessel's correction
    variance = numerator.div(denominator, axis=0)
    variance = variance[['<x>', '<y>', '<x^2>','<y^2>','msd']]
    std = np.sqrt(variance)
    std.columns = 'std_' + std.columns

    return results.join(std)

And I am changing the function's comment to indicate the new columns:

    Returns
    -------
    Series[msd, index=t] or, if detail=True,
    DataFrame([<x>, <y>, <x^2>, <y^2>, msd, N, lagt,
               std_<x>, std_<y>, std_<x^2>, std_<y^2>, 
               std_msd],
              index=frame)

Best, Viva

caspervdw commented 8 years ago

Closing through #352

caspervdw commented 8 years ago

@vivarose I am reopening this issue, because we did not merge your PR. Let's keep this issue open until there is a new PR created.

vivarose commented 8 years ago

My PI Vinny has a classroom student working on a Bayesian approach to the error analysis, which may supersede my pull-request.

stevenvanuytsel commented 4 years ago

Any update on whether this will be merged anytime soon?

caspervdw commented 4 years ago

I don’t believe there is anything to merge yet.

vivarose commented 4 years ago

Rather than plotting error bars, I decided to publish plots where I showed all of the individual particle MSDs (faintly) to show the variation, and the ensemble MSD (bolded) to show the weighted average. This visualizes the variation without requiring a calculation of the standard error.

The publication is available: https://link.springer.com/article/10.1140/epjst/e2019-800026-y Free: https://arxiv.org/abs/1806.05760 See figures 5A and 6A.

My code for analysis and plotting is available: https://github.com/vivarose/using_trackpy

In particular: https://github.com/vivarose/using_trackpy/blob/master/2018%20EPJST%20figures/EPJ%20paper%20(Janus%2C%20green)%20-%20Figure%205A.ipynb https://github.com/vivarose/using_trackpy/blob/master/2018%20EPJST%20figures/EPJ%20paper%20(tracer%2C%20red)%20-%20Figure%206A.ipynb

You are welcome to use my code and please cite the publication if you do!

vivarose commented 3 months ago

If you are looking to calculate the standard deviation of the MSD, you are welcome to use my code, https://github.com/horowitz-lab/using_trackpy/blob/master/stdev_emsd.py

soft-matter / trackpy

Errorbars on ensemble MSD plot #349