Open yim-fan opened 1 year ago
Almost done, will have another commit by the end of by Tuesday
@yim-fan please put a reference link to the git push so I can check the results.
Done: dealing with outliers
Done: seperate normal and adversarial situations
Outlier detection and analysis is done.
TODO: add lines for missing lines in trip, and impute missing using kalman filter
@yim-fan is the imputation is also done? also, have you considered clustering for the data?
The imputation is already done. As I presented in week 6 meeting, I did not get Kalman filter working for imputation, but instead, I have done a naive impute that fill in missing value according to the average of previous and latter non-missing values. It should be good enough for now. If we get time later, I will see if need to come back and work with Kalman filter to improve model performance.
The ways I impute each fields are as follows: DEPTH: impute with population mode HEADING, WIND_SPEED, WIND_SPEED_TRUE, WIND_ANGLE, WIND_ANGLE_TRUE: impute with the mean value within each corresponding trips. everything else: impute according to non missing previous and latter values. For example, if there is a missing speed, the previous sample value for speed is 800 and the latter sample value is 900, 850 will be used to fill in the missing.
The code can be found here: https://github.com/pagand/model_optimze_vessel/blob/371adc1af0885be6e3d5e09267ad77f0b41993b8/Prepration/imputing_and_outlier.ipynb
For clustering, I've tried K-means and DBSCAN, they do not seem to improve the model performance, so I excludes the clustered feature in the current model.
No need for clustering since the model performs OK so far.
Preprocessing, clean up data, remove outliers, etc.