Closed vc1492a closed 4 years ago
Hey @hamlinliu17 - saw that you committed a scratch notebook! Good stuff. Just be sure to drag this to the in progress section once you get started, thanks!
@hamlinliu17 also wanted to remind you to take a look at Val Scratch.ipynb
in case any of that code is helpful in generating pandas dataframes for subsequent analysis, such as generating summary statistics. I'm here to help 👍
@hamlinliu17 how's your progress on this issue? If you have any updated work, would you be able to commit it to the feature/data_summary_statistics
branch? Thanks! Let me know if there's anything you'd like for me to take a look at or comment on.
@vc1492a I have been able to mostly get the daily averages of dStec/dt
each file. For the hourly and by-minute-averages, how should I visualize it? I will be sure to commit soon.
@hamlinliu17 thanks for the prompt reply! Simple line chart can suffice, maybe break it down by spacecraft, ground-station, or combination of both?
@vc1492a I have committed a sample plot of just one satellite to the feature branch G03. I guess if you could point me in the right direction towards formatting the plot that would be helpful.
Thanks, will take a look prior to the call tomorrow
@hamlinliu17 thanks for committing your code! I took a look at notebooks/Hamlin Scratch.ipynb
and had the following feedback:
pandas
library directly for this sort of work. The sod
index is the second in the day when you examine a particular dataframe. Convert this second of day index to a Pandas datetime index, and then do a pandas groupby operation (e.g. df.groupby()) to isolate the mean values over a given time range, such as days. This will allow you to be flexible in your summary statistics, and will make it easy to I don't have an example directly ready for this dataset but will update my comments here if I end up creating it tomorrow before we chat. Lastly, thank you for being organized with your code and commenting, in addition to adding docstring - it helps me review!
@hamlinliu17 I am working on bullet 2 actually as part of this issue, but if you make any progress on that front let me know and we can fold in your code 👍
@hamlinliu17 thought I'd check in - how are things going?
@vc1492a These past few days have a been a bit busy so have not gotten much done. Hopefully by the weekend I will be able to progress through this issue.
Thanks for the update! No rush, I know it's a busy period in school. Just wanted to see if there's any questions o can answer or things I can help out with. You can reach out to me when needed!
@vc1492a I am able to finish the first task here. Just for a quick clarification though, how do i determine the satellite name and the ground station number? For example, in the pairing pg2r__G01
, is pg2r
the satellite or the ground station? After this, I will try testing out some visualization methods.
@hamlinliu17 in the case you cited the ground station is pg2r and the satellite is G01.
Thanks for the prompt reply @MichelaRavanelli!
@vc1492a I made a some progress today and was able to plot some examples in my scratch notebook here. Turns out the plotly objects are not showing up on github since they are statically rendered so I will try and post some of the plots, but you can also render the notebook locally.
Thanks @hamlinliu17 will pull your latest commit on the feature/data_summary_statistics
branch soon and check this out!
@hamlinliu17 checked out your work and it looks good, hits all the points described above. Feel free to open a PR into dev
. I want to get a better understanding as to the data we have through time and pinpoint when we see changes in dStec/dt
prior to stepping into the modeling step, but that work can be captured in other issues.
@vc1492a @MichelaRavanelli I have pushed a commit on the features/data_summary_statistics
branch with a file testing.png
(attached below). This picture has some line plots of the minute averages for the pairs ahup__G04, ahup__G07, ahup__G08, ahup__G10, ahup__G13, ahup__G20, ahup__G23
just like what was examined the paper. ahup__G07
seemed to have a sinusoid pattern going on from minute 3400 to 3420 and ahup__G08
seemed to have a similar pattern albeit a little more subtle from 3420 to 3440 which somewhat matches Figure 2 from the paper. I will try and smooth out the lines to give us a better idea for the other satellites.
Thanks @hamlinliu17, this is exactly what we are looking for! We will need to, at some point, annotate the time series as to the start and finish times of anomalies to so we can formally measure precision, recall, etc. in our experiments. I'll make a note separately to do that!
Can you go ahead and create a merge / pull request from your branch into dev
and tag it as [WIP]
? Just want to start keeping track of this and other related issues more formally, thanks!
We know that the data covers a total of 5 days - 2 days prior to the day of the earthquake, the day of the earthquake, and 2 days following the earthquake. High-level summary statistics of the data should be generated. The below list is not comprehensive, but the below summary statistics should be provided in order to provide the appropriate level of context for later stages of the work.
dStec/dt
. Also calculate maximums, minimums, and variances.