tylerni7 / missile-tid

Identifying missile launch signatures from traveling ionic disturbances using GPS data
BSD 3-Clause "New" or "Revised" License
130 stars 15 forks source link

thin plate spline bias solving #9

Open tylerni7 opened 3 years ago

tylerni7 commented 3 years ago

Bias solving right now is trash. The thin-plate-spline model from mgnute was great and we should add that back in.

MGNute commented 1 year ago

In truth, the bias should really be something that is stored and retrieved at the time of calculating the vtecs, and then we should have a separate process that periodically analyzes the biases for consistency and updates them based on some amount of assumed drift. In reality they don't have a ton of drift. Actually for the GPS satellites at least the NOAA ionosphere maps provide values for the bias that were pretty close to ours (last I looked anyway). But one thing that could be an option is storing these values in like a MySql database (maybe in the lookup_tables subfolder) and then creating a separate routine to estimate them and QC the estimates periodically. The bias-estimation tends to be far more computationally intensive than any other step, so this would reduce the compute time substantially. Thoughts?

tylerni7 commented 1 year ago

I'm a fan of something along these lines in some cases. I think probably it would be something like a separate process which runs and produces a bias file/database/whatever. If it exists, it gets used. If not, the expensive operation is done to calculate them. I don't think this makes sense to maintain for every (station, day) and (satellite, day). But maybe something that keeps < a few days worth of data is reasonable.

The NOAA biases are nice, but I don't think they have the stations, and we need the biases of both. I'm definitely not opposed to using published values when we can. I also don't know how often they are published (we want to be able to get values that are current, not "last weeks" biases or whatever).

MySQL doesn't really make sense here to me. Even a lighter thing like SQLite is probably overkill. I don't think it is something that we would want to commit to git, because it's ever growing and a lot of data. It might make more sense to stick with a native format in the already shared cache type folder. Maybe like h5 or something. It's a little less flexible, but we're already using things like that so no extra dependencies and things.

MGNute commented 1 year ago

I'm a fan of something along these lines in some cases. I think probably it would be something like a separate process which runs and produces a bias file/database/whatever. If it exists, it gets used. If not, the expensive operation is done to calculate them. I don't think this makes sense to maintain for every (station, day) and (satellite, day). But maybe something that keeps < a few days worth of data is reasonable.

Ya I was thinking something along these lines. I do actually think there is some value in storing more historical data on the bias estimates that could be used for optimizing and for QC, though that's a separate point. It could be pretty easy to create a command to clear say all but the most recent estimates. But yes, this structure makes sense.

The NOAA biases are nice, but I don't think they have the stations, and we need the biases of both. I'm definitely not opposed to using published values when we can. I also don't know how often they are published (we want to be able to get values that are current, not "last weeks" biases or whatever).

That's true, they don't have stations. And yes, we wouldn't want to use those; I just thought of that off hand in that the bias values are relatively stable over time...

MySQL doesn't really make sense here to me. Even a lighter thing like SQLite is probably overkill. I don't think it is something that we would want to commit to git, because it's ever growing and a lot of data. It might make more sense to stick with a native format in the already shared cache type folder. Maybe like h5 or something. It's a little less flexible, but we're already using things like that so no extra dependencies and things.

My bad, I meant SQLite, although I'm fine with using h5 since that's what we're already using and keeping it in the cache folder. But if we're good on those details I can create a branch and start thinking about how that would work.

tylerni7 commented 1 year ago

The NOAA thing might still be useful. I kind of imagine some sort of Kalman filter-esque thing where we can get potentially slow but highly reliable data from NOAA or a more expensive operation like a full Thin Plate Spline model, and then use simpler things to model short term drift. That might be overkill depending on how the biases actually change over time though!

But yeah overall something like this seems useful to me