knebiolo / mast

Movement Analysis Software for Telemetry (MAST) for use in removing false positive and overlap detections from radio telemetry projects and assessing 1D movement patterns
Other
0 stars 1 forks source link

Lagdiff dispersion issues #13

Closed tcastrosantos closed 1 week ago

tcastrosantos commented 3 years ago

When known false positives are sparse (possibly and partly a consequence of codeset crowding) there may be empty bins for lagdiff that would hold a substantial proportion of data had there been more data from which to build the distribution. Because lagdiff is unbound rare events fall outside the range of the histograms, possibly making it hard to detect this problem. Unclear how plus-1 smoothing affects this with small n, but I suspect the inflation on the density function is much greater than for well-populated datasets. This can lead to erroneous filtering. Fixing the scales for the density functions on the plots will help with this, as (maybe) will binning outliers together (i.e. everytning >100 goes into 1 bin, likewise for <-100). However those empty cells look problematic (see attached). Users should be alerted to this hazard.
lotek_F33a_lattice_train

tcastrosantos commented 3 years ago

This topic bears on codeset crowding and data sufficiency. One possible solution I'd like to discuss is allocating data classified as noise during Classify 1 (or maybe later) in to the 'know noise' group. The figure below is what I got after the second iteration on Classify 3. Note that the false postiives look like a pretty good match to the known noise, but lagdiff is now much better populated and the missing cells are gone (this is 1100 records). One cautionary note is that the first iteration cleaned >19000 records. So if we accept those, and allowing for the 100 records used in the initial cleaning a couple issues emerge. Notably the 19k records become the de-facto training set. ALTERNATIVELY I did turn off the prior for this (because we had 100 false positives and >300k valid data...>600k if you include both antennas--this is an antenna switching site). I'm not sure if I would have cleaned anything had I left the prior in place, however it would be a very conservative way of adding to the set.

lotek_lattice_class