shawntz / eeb-c177-w20

lab section materials for eeb c177/c234 @ucla (winter 2020) 🐻
https://shawnschwartz.com/eeb-c177-w20/
Other
23 stars 1 forks source link

Shreya-Lightening Talk Presentation #66

Closed shreyatrivedi26 closed 4 years ago

shreyatrivedi26 commented 4 years ago

Please find the youtube link for my presentation below:

https://youtu.be/yFFdnkpk0VU

omerlavian20 commented 4 years ago

Awesome job! I didn't realize you could plot maps in Python (though given the versatility of the language I can't say I'm too surprised). I was wondering if you could describe your data cleaning process. What techniques did you use to do this (Shell commands? Regular expressions?)?

shreyatrivedi26 commented 4 years ago

Thank you Omer! Yes, Python is infact a very good software (also one of the reason why I took this class) for dealing with spatial datasets. The type of dataset I used is binary in nature and can get a bit tricky to be handled by softwares unless all its components are well understood. For the NetCDF or similar raster datasets, python comes with various libraries(as pointed out in my talk) which read the data in a correct manner i.e. without manipulating its structure or type. Without those specialized libraries, python might end up reading the data as a dictionary which might not be quite relevant for getting meaningful plots. So for the spatial plots I used only those libraries in Python primarily for importing it in its original state. You can access all my codes on the link provided.

Lemme know if you need more clarity on that. I will be happy to help :)

jpocon commented 4 years ago

Hi Shreya, this was an excellent lightning talk that covered not only a complex topic well, but also explained a complex data type, too! Well done. My question is in regards to your final plot (the density distributions per month) -- why is Sept and Oct plotted out of order of May through Aug? Is this a quirk in the order of matrices within the netCDF? Thanks for a such a great talk.

shreyatrivedi26 commented 4 years ago

Hey Jon, Thanks for watching it. That is a very good observation. I saw that too. You are right, its is essentially the way time-steps are arranged in the dataset. For this particular dataset, as I mentioned in the talk, the months considered are only for spring and winter from 2002-11. Within that too, the year 2002 has just two time-steps i.e. September and October which happen to be the first two time-steps in the data. I am assuming that this is the reason for R to order it in that way as you see in the curves. But yes, I would definitely try to fix it to look better. Thanks for pointing it out :)

jessicadeanda commented 4 years ago

This analysis is so advanced! Like Omer, I had no idea that you could plot maps in Python (I thought only GIS did that). I think mapping trends would be useful for my dataset too, but my question is: in order for python to plot a map, does the dataset have to be multidimensional? I see that your dataset contains geographic information (longitude/latitude) in addition to data on sea ice thickness, so I'm assuming this is necessary to generate a map plot, right?

alexphu1230 commented 4 years ago

Hi Shreya! Your presentation was great! I thought that your analysis was so in depth and the plots that you used to visualize your data provided so much information! I was wondering how you decided that that plotting method would be the best representation of your data? I know that your data was multidimensional, but were there other visuals that you wanted to use to showcase certain aspects of your data?

KaranSingh-14 commented 4 years ago

Hey Shreya, this was a great presentation! Based upon your data that you displayed, it is quite noticeable that the data outputs differ between the two months due to the seasons. My question is that, how would you go about measuring the statistical significance of your finding and what this data means for us as humans? What if in the future the future, the summer and winter output figures start to look the same due to our climate constantly getting warmer to the point where seasonal difference may longer be the answer?

robertreny commented 4 years ago

Great job Shreya! All of your plots are really well done, I especially liked the density estimates plot as a nice visualization without using an actual map. Your drawing arrows on the multi-dimensional slide was a really nice tough to guide the viewer. For the maps you made showing the sea ice thickness, I wasn't totally clear which side of the purple to yellow scale was "good" or "bad" but maybe I just missed that.

shreyatrivedi26 commented 4 years ago

@jessicadeanda It's a very good question! Not all maps need latitudes and longitudes. For example, I remember for one of our labs, Shawn demonstrated a "heat map" which is a raster map but needed no latitude and longitude for plotting.

For creating maps as Frida, Jon and I did for the project, we need geographic coordinates as we are trying to look at the changes taking place at a particular location on Earth. The type of dataset we used have a set of defined latitudes and longitudes. As I mentioned netcdf files are multi-dimensional because they have different time-steps attached to it. Similar to my data, Jon also uses spatial data but I believe his files are for one time-step and just has one matrix of latitude and longitude making it 2-dimensional. Still the plots he created are similar to mine.
Additionally, for aiding us in plotting such lat-lon oriented maps, both Python and R have functions like Basemaps which assist in aligning our data to the correct coordinates. Hope I am able to answer your question :)

shreyatrivedi26 commented 4 years ago

@alexphu1230 Thank you for asking that!

Actually I wanted to primarily look at the variability within a particular variable in the dataset. Keeping this objective in mind, I only looked for those statistical techniques which would help me meet my objective. Hence, the two temporal plots I chose were because they gave me a good representation of variations in SIT over the time-period. Spatial Variations show me how the thickness of the ice changes over the space and for this it was best to look at its seasonal distribution, hence I chose the simplest method of analyzing the observational changes.

I created an animation on Python to view the real time changes in SIT. You can check it out here: https://www.youtube.com/watch?v=yMVEMkiCwxk

Hope it helped! :)

shreyatrivedi26 commented 4 years ago

@KaranSingh-14 Great question!

The dataset I used is mainly used for carrying out climate modeling related studies which in most of the cases tries to project the changes taking place in various climatic parameters in different user-defined scenarios. Hence, the study become quite relevant for studying the impacts of climate change on SIT in Antarctica (maybe at some point in my research I actually do something like that). The only problem with the climate models is the uncertainty and "biasness" associated with the data. What I mean here is that if we try to predict temperatures or SIT or precipitation for the future since it is based on statistical predictions and forecasting methods, there would always be an inherent "error" attached to it. So in such studies whatever predictions we make, we tend to say that with an uncertainty of +/5 or something. It is for this reason that models keep getting updated with better resolution and precision so that the error in the predictions is reduced. In my case, since the dataset is very small (a short timeperiod as I put up as a limitation on my slides too), I am just looking at observational changes and making no concrete conclusions whatsoever.

It was a great question! Thanks for asking it :)

shreyatrivedi26 commented 4 years ago

@robertreny Thank you very much.

Density plots were my favorite too. :) So for my analyses in R, I have used the anomalies which can be defined as deviations from the mean. Having said that SIT anomalies are difference between observed SITs in a particular month and the average over all the months. Hence, negative values in a month mean that particular month had a values lesser than the average SIT and is marked by lower thickness in the sea-ice. So in the density plots, the negative values are purple and positive ones are yellow. Purple/negative and Yellow/Positive SITs are "bad" and "good" respectively as we wouldn't prefer the thickness to reduce atleast during winters as it is an indicator of global warming. In the spatial maps too, same convention is used except that the anomalies are replaced with original values of SIT. Hence lower values (Purple) in winters are not-desirable.

Deap-Bhandal commented 4 years ago

It is a very well done presentation. You clearly showed a high comprehension of data manipulation. I really liked you plots. They were informative and graphically well designed. I was most intrigued on plotting multidimensional data. I'm not sure I completely understand how your dataset organized these values though. I believe you said that each matrix had latitude and longitude and the matrix corresponded to a time. How is the annual temperature output stored in the matrix? Also is it possible to have program read data stored in sets (or something similar) where each multidimensional set contained all the values (i.e. {latitude, longitude, temperature, time})? I might have not worded that correctly, but let me know if you understand what I mean.

shreyatrivedi26 commented 4 years ago

@Deap-Bhandal I understand. Making sense of NetCDF data could sometimes be tricky. So for annual temperature output, the matrix would look the same as i showed in the presentation, i.e. longitudes on x-axis and latitudes on y-axis. The grids formed by these intersecting lat-lon would contain the annual temperature of that particular area. Hence, by plotting that matrix, we get a spatial plot with annual temperature values for a particular year.

I couldn't actually understand what you meant by the last question but I think the following video should be able to answer your question: https://www.youtube.com/watch?v=XqoetylQAIY

Deap-Bhandal commented 4 years ago

@Deap-Bhandal I understand. Making sense of NetCDF data could sometimes be tricky. So for annual temperature output, the matrix would look the same as i showed in the presentation, i.e. longitudes on x-axis and latitudes on y-axis. The grids formed by these intersecting lat-lon would contain the annual temperature of that particular area. Hence, by plotting that matrix, we get a spatial plot with annual temperature values for a particular year.

I couldn't actually understand what you meant by the last question but I think the following video should be able to answer your question: https://www.youtube.com/watch?v=XqoetylQAIY

It did! Thanks so much