vc1492a / tidd

An approach for detecting tsunamis using anomaly detection anomalies on sTec d/dt data from orbiting GPS satellites.
Other
5 stars 1 forks source link

Data Summary Statistics #1

Closed vc1492a closed 4 years ago

vc1492a commented 4 years ago

We know that the data covers a total of 5 days - 2 days prior to the day of the earthquake, the day of the earthquake, and 2 days following the earthquake. High-level summary statistics of the data should be generated. The below list is not comprehensive, but the below summary statistics should be provided in order to provide the appropriate level of context for later stages of the work.

vc1492a commented 4 years ago

Hey @hamlinliu17 - saw that you committed a scratch notebook! Good stuff. Just be sure to drag this to the in progress section once you get started, thanks!

vc1492a commented 4 years ago

@hamlinliu17 also wanted to remind you to take a look at Val Scratch.ipynb in case any of that code is helpful in generating pandas dataframes for subsequent analysis, such as generating summary statistics. I'm here to help 👍

vc1492a commented 4 years ago

@hamlinliu17 how's your progress on this issue? If you have any updated work, would you be able to commit it to the feature/data_summary_statistics branch? Thanks! Let me know if there's anything you'd like for me to take a look at or comment on.

hamlinliu17 commented 4 years ago

@vc1492a I have been able to mostly get the daily averages of dStec/dt each file. For the hourly and by-minute-averages, how should I visualize it? I will be sure to commit soon.

vc1492a commented 4 years ago

@hamlinliu17 thanks for the prompt reply! Simple line chart can suffice, maybe break it down by spacecraft, ground-station, or combination of both?

hamlinliu17 commented 4 years ago

@vc1492a I have committed a sample plot of just one satellite to the feature branch G03. I guess if you could point me in the right direction towards formatting the plot that would be helpful.

vc1492a commented 4 years ago

Thanks, will take a look prior to the call tomorrow

vc1492a commented 4 years ago

@hamlinliu17 thanks for committing your code! I took a look at notebooks/Hamlin Scratch.ipynb and had the following feedback:

Lastly, thank you for being organized with your code and commenting, in addition to adding docstring - it helps me review!

vc1492a commented 4 years ago

@hamlinliu17 I am working on bullet 2 actually as part of this issue, but if you make any progress on that front let me know and we can fold in your code 👍

vc1492a commented 4 years ago

@hamlinliu17 thought I'd check in - how are things going?

hamlinliu17 commented 4 years ago

@vc1492a These past few days have a been a bit busy so have not gotten much done. Hopefully by the weekend I will be able to progress through this issue.

vc1492a commented 4 years ago

Thanks for the update! No rush, I know it's a busy period in school. Just wanted to see if there's any questions o can answer or things I can help out with. You can reach out to me when needed!

hamlinliu17 commented 4 years ago

@vc1492a I am able to finish the first task here. Just for a quick clarification though, how do i determine the satellite name and the ground station number? For example, in the pairing pg2r__G01, is pg2r the satellite or the ground station? After this, I will try testing out some visualization methods.

MichelaRavanelli commented 4 years ago

@hamlinliu17 in the case you cited the ground station is pg2r and the satellite is G01.

vc1492a commented 4 years ago

Thanks for the prompt reply @MichelaRavanelli!

hamlinliu17 commented 4 years ago

@vc1492a I made a some progress today and was able to plot some examples in my scratch notebook here. Turns out the plotly objects are not showing up on github since they are statically rendered so I will try and post some of the plots, but you can also render the notebook locally.

vc1492a commented 4 years ago

Thanks @hamlinliu17 will pull your latest commit on the feature/data_summary_statistics branch soon and check this out!

vc1492a commented 4 years ago

@hamlinliu17 checked out your work and it looks good, hits all the points described above. Feel free to open a PR into dev. I want to get a better understanding as to the data we have through time and pinpoint when we see changes in dStec/dt prior to stepping into the modeling step, but that work can be captured in other issues.

hamlinliu17 commented 4 years ago

@vc1492a @MichelaRavanelli I have pushed a commit on the features/data_summary_statistics branch with a file testing.png (attached below). This picture has some line plots of the minute averages for the pairs ahup__G04, ahup__G07, ahup__G08, ahup__G10, ahup__G13, ahup__G20, ahup__G23 just like what was examined the paper. ahup__G07 seemed to have a sinusoid pattern going on from minute 3400 to 3420 and ahup__G08 seemed to have a similar pattern albeit a little more subtle from 3420 to 3440 which somewhat matches Figure 2 from the paper. I will try and smooth out the lines to give us a better idea for the other satellites.

testing

Screen Shot 2020-06-14 at 9 15 58 PM

vc1492a commented 4 years ago

Thanks @hamlinliu17, this is exactly what we are looking for! We will need to, at some point, annotate the time series as to the start and finish times of anomalies to so we can formally measure precision, recall, etc. in our experiments. I'll make a note separately to do that!

Can you go ahead and create a merge / pull request from your branch into dev and tag it as [WIP]? Just want to start keeping track of this and other related issues more formally, thanks!