rc / dist_mixtures

BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

gap in histogram for Estimated Distribution plot #5

Open josef-pkt opened 11 years ago

josef-pkt commented 11 years ago

the histogram misses one slice

I didn't look at the code yet

josef-pkt commented 11 years ago

I think the easiest will be to transform endog at the beginning in plot_dist

endog = xtransform(self.endog)

if xtransform is not None

Also I think we should set the bins on xtransform(np.linspace(-np.pi, np.pi, n_bins + 1)) instead of letting matplotlib choose the bins based on min and max of the data.

rc commented 11 years ago

It does not work that simply, as xtransform not only "rotates" the domain, but also changes radians to degrees. I have fixed the gap by using different construction of fdata - instead of simply repeating the angles count-times, which leads to the last angle 179 (180 is missing!), I am adding a linear sequence given by counts between each subsequent angles. In this way also the last interval (or bin) [179, 180] gets a correct number of values. See https://github.com/rc/dist_mixtures/commits/updates

josef-pkt commented 11 years ago

I need to look into more detail, and rebase my branch.

Isn't 180 supposed to be missing?

Now that I'm using also the cdf to construct the probabilities of each bin (pmf_bins or pdf_bins), I need the boundaries of the observed bins. Currently, I'm assuming -90 is [-90,-89), ....[89,90) left open intervals (since angles are continuous the open or closed doesn't matter for probabilities, but matters a bit for modulo transform, wrapping around the circle)

So, we should have 180 bins for [-90, 90) or [-pi, pi) or any shift/transform of it, and 181 bin edges.

fdata currently is in 2 degree intervals

>>> len(np.unique(fdata))
180
rc commented 11 years ago

Yes, 180 is supposed to be missing, and still is. What I am adding in spread_by_counts() is a linear sequence of values in [179, 180). As for len(np.unique(fdata)) == 180, I have changed the np.unique(fdata) line in the example to

 >>> uni_fdata = data[:, 0]

to prevent an exception. Is it ok?

josef-pkt commented 11 years ago

Sounds ok to me. I don't think it makes much difference in the estimation results. (except we cannot take a shortcut with weighted maximum likelihood that I haven't written yet. and speed is not much of an issue).

I've seen in one paper a reference to another paper that assumed uniform distribution within each bin. I don't know if they randomized or not.