vc1492a / tidd

An approach for detecting tsunamis using anomaly detection anomalies on sTec d/dt data from orbiting GPS satellites.
Other
5 stars 1 forks source link

Set aside N tracks for "real-world" experiment #70

Closed vc1492a closed 3 years ago

vc1492a commented 3 years ago

While metrics from model training are insightful and helpful, it doesn't quite accurately portray how the model will perform in practice. We need to set aside N number of tracks (ground station, satellite combinations) for use in an experiment which simulates real-world use of the model.

vc1492a commented 3 years ago

First #69 must be completed.

vc1492a commented 3 years ago

Also plot these tracks and save the figures as part of this issue.

vc1492a commented 3 years ago

34 ground station, satellite combinations should be set aside in order to ensure the validation set is at least 20% of the data. However, with 3 satellites in the data (G07, G08, and G20), it may make sense to ensure that the validation set contains an equal amount of each satellite. We could have 11 ground stations for each satellite in the validation set, resulting in 33 observations (or the validation set will be 19.64% if the original data. I think that's close enough for being able to balance the validation set.

Will right some code to randomly sample which ground station and satellite combinations we will keep, and then will manually set those aside in the data through some further reorganization of the directory structure. Once that's complete, I can return to #65 and update the readme accordingly.

vc1492a commented 3 years ago

Oh and I'll include the code in notebooks/data_validation.ipynb on the feature/validate_data branch.

vc1492a commented 3 years ago

I set aside some data to use for validation, 11 from each satellite. Started the model training process so everything seems to be in order and working effectively.

I'll close this issue and proceed to #65.