Open JonasFrey96 opened 1 year ago
makes currently a lot of sense when we have a bimodal distribution but makes little sense when we don't have one
maybe we can improve this - visualizing this plot would be nice
I checked the confidence generation: In update_running_mean:
# The confidence used to be computed as the distance to the center of the Gaussian given factor*sigma
# This is certainly wrong given that the gaussian is simply the wrong function for the job
confidence = torch.exp(-(((x - self.mean) / (self.std * self.std_factor)) ** 2) * 0.5)
confidence[x < self.mean] = 1.0
# My suggestion is the following - we define the 0.5 confidence value to be at self.mean + self.std*self.std_factor
# And then we use two other points e.g. plus and minus two std to define the 1 confidence and 0 confidence points
# shifted_mean = self.mean + self.std*self.std_factor
# interval_min = shifted_mean - 2 * self.std
# interval_max = shifted_mean + 2 * self.std
# x = torch.clip( x , interval_min, interval_max)
# confidence = 1 - ((x - interval_min) / (interval_max - interval_min))
In inference_without_update -> which is used for visualization, we used to do something completely different.
Still the problem would remain when we start out training the confidence is very high everywhere and then only gets smaller for regions over time - therefore initially the traversability is over-optimistic
I'm not sure I understood the coment above. Was it mainly about the fact that we compute the confidence in different ways in the confidence generator using in the training loop, vs the one that generates the published messages?
I believe we should rethink this to make it more principled. I think that many things we tried out for the paper were mostly driven by wanting to make the system work (subject to the deadline constraint).
Some general agreements we have discussed:
I'm thinking that maybe should go back to basics before getting crazy with the formulation. I'll use loss
for the loss, c
for the confidence, and t
for time. Let's use the reference image we used in the paper
What we know:
I would propose that:
c=1.0
at loss=0.0
.c=0.0
at loss=(mean of the distribution at t=0.0)
. In the figure, we should set it at loss=5.0
Pros This definition should ensure that at t=0.0
we will get low confidence, and it does not need to explicitly label the positive and unknown samples, because the confidence model gets fixed at the beginning. No need for running means or Kalman filters. We don't need to set an adaptive threshold either, we just rely on the initial condition.
Cons The initial loss distribution (positive + unknown) could change. The plot from the paper shows it doesn't change that much (the mean of the grey histogram stayed centered at loss=5.0
. But if we implement this plot as a debug visualization, we could confirm if this is the case.
A next iteration would be to make the threshold adaptive as we did. The main trend we should expect is that as we collect more data, we will be more and more conservative about what we feel confident about, but it will not change about the unknown things.
loss=0
(like underestimating the distribution, and then the sigma value).p_positive
, and p_unknown
: c=0.0
at loss=0.5*(p_positive + p_unknown)
loss=5.0
, same as the simple case. Then, the confidence for all the samples should be low as intended.loss=2.5
if the unknown samples' distribution stays the same.loss=alpha*p_positive + (1-alpha)*p_unknown
. But perhaps this adds unnecesary complexity.We finally could get crazy about the confidence estimate using anomaly detection stuff. Now we are learning the distribution of samples through the autoencoder but we are not enforcing any structure in the distribution---what we could do. Some brainstorming: