shawntschwartz / eeb-c177-w20

lab section materials for eeb c177/c234 @ucla (winter 2020) 🐻
https://shawnschwartz.com/eeb-c177-w20/
Other
23 stars 1 forks source link

Frida Perez - Lightning Talk Presentation #68

Closed fridalejandra closed 4 years ago

fridalejandra commented 4 years ago

https://youtu.be/E2JazC7oUEQ

jpocon commented 4 years ago

Hi Frida, great talk! Loved how you framed the problem in the beginning, it helped set up your analysis in an interesting way. In regards to plotting your data, it appears all the analyses and mapping happens in Python - I'm assuming this is because NetCDFs are easier to manage in that language? - but the statistical charting happens in R. Does netcdf4 (or your mapping module) include chart capabilities as well? That ggplot code looked liked a beast, so I was wondering if it's worth while to script everything in python instead, or do you prefer ggplot in R?

omerlavian20 commented 4 years ago

I really liked this video! All the plots you showed were really cool. I particularly liked the last plot showing sea ice thickness over a map of Antarctica. Looking at projects like yours makes me really want to explore how Python and/or R can be used to plot data onto maps!

As someone who didn't know what netCDF data were until watching your and Shreya's videos, I am curious as to how similar or different it was working with them as opposed to the data we used for class assignments (i.e. a CSV file with lists of animal species or the datasets we manipulated within R). For example, if you want to do a linear regression with two variables from the netCDF data like you did comparing sea ice thickness to surface temperature, how would you compare the difficulty of doing the regression with netCDF data to, for example, a CSV file with the same data where for each thickness measurement there are measurements for several other variables including surface temperature?

shreyatrivedi26 commented 4 years ago

This was a great talk! Working with the same dataset I understand that it is one of the most complex datasets to deal with owing to its complicated structured through multi-dimensions. So these plots are definitely a treat to watch. My first question is regarding the coefficients in your first linear model plot. What do they represent and what is their significance? My second question is regarding your multiple linear plots. Do you think using something like a facet_grid or a facet_wrap would reduce the ginormous code you wrote for the plot? :D

KaranSingh-14 commented 4 years ago

Great presentation! I have a question similar to Shreya's, what exactly do the alpha and beta variables mean? Is it the effect that each variable on itself minus the effect that the variable have for each other. I liked the linear regression and how your showed the correlation value (R^2), based upon that value, it is suggesting that the correlation is not that strong correct?

fridalejandra commented 4 years ago

Hi Frida, great talk! Loved how you framed the problem in the beginning, it helped set up your analysis in an interesting way. In regards to plotting your data, it appears all the analyses and mapping happens in Python - I'm assuming this is because NetCDFs are easier to manage in that language? - but the statistical charting happens in R. Does netcdf4 (or your mapping module) include chart capabilities as well? That ggplot code looked liked a beast, so I was wondering if it's worth while to script everything in python instead, or do you prefer ggplot in R?

Hey @jpocon, thanks! To answer your question yes, there are charting capabilities for us NetCDF users in python. I have plotted a couple of averaged values from numpy over time , and am working on some boxplots with pandas and Seaborn. I would like to script in just python, being that R cannot handle the latitude and longitude as we hope, but as of now I am more comfortable plotting and running stats in R (it is also much nicer).

fridalejandra commented 4 years ago

I really liked this video! All the plots you showed were really cool. I particularly liked the last plot showing sea ice thickness over a map of Antarctica. Looking at projects like yours makes me really want to explore how Python and/or R can be used to plot data onto maps!

As someone who didn't know what netCDF data were until watching your and Shreya's videos, I am curious as to how similar or different it was working with them as opposed to the data we used for class assignments (i.e. a CSV file with lists of animal species or the datasets we manipulated within R). For example, if you want to do a linear regression with two variables from the netCDF data like you did comparing sea ice thickness to surface temperature, how would you compare the difficulty of doing the regression with netCDF data to, for example, a CSV file with the same data where for each thickness measurement there are measurements for several other variables including surface temperature?

@omerlavian20 Hi Omer, thank you! To answer your question I would say a lot of what we learned in class was a bit difficult to apply to our data set more so than it would be for a CSV file, particularly regex/CSV clean ups. However it was not impossible, and we were able to turn our data into CSV files (with the help of Shawn and Professor Alfaro). It was a lot of CSV outputs, but the linear regression I did between the two variables I outputted the averages to a CSV file to read in R. So I am just adding an extra step of extracting the variable I want and exporting it as a CSV file. In short answer, it is not as difficult to do the linear regression without going the CSV route I can do it by turning the variables into data frames in python as well as using NumPy.

fridalejandra commented 4 years ago

This was a great talk! Working with the same dataset I understand that it is one of the most complex datasets to deal with owing to its complicated structured through multi-dimensions. So these plots are definitely a treat to watch. My first question is regarding the coefficients in your first linear model plot. What do they represent and what is their significance? My second question is regarding your multiple linear plots. Do you think using something like a facet_grid or a facet_wrap would reduce the ginormous code you wrote for the plot? :D

Hey there @shreyatrivedi26 ! As for the first plot , beta represents the regression coefficient (-9.56) so to put in context , it is the change in variable Y when the variable X changes one unit. Given it is negative it also indicates the negative relationship, for a unit increase in SIT, we will see the temperature decrease by -9.56 and we can model a change in the response. These coefficients had a p-value less than 0.0, suggesting the two variables are highly related, but because it is very much expected that they are related it is important to run more tests and introduce more variables and run a MLR.

As for your second question, yes those are great suggestions. It is a chunky code and I am working on making it smaller as well as more dynamic. As always, thank you! :)

fridalejandra commented 4 years ago

Great presentation! I have a question similar to Shreya's, what exactly do the alpha and beta variables mean? Is it the effect that each variable on itself minus the effect that the variable have for each other. I liked the linear regression and how your showed the correlation value (R^2), based upon that value, it is suggesting that the correlation is not that strong correct?

Hey @KaranSingh-14 , thank you for your questions. I hope I answered part of it in my response to Shreya's. As for the R^2, they way I interpret this is how much temperature varies as predicted/explained by the change in Sea Ice Thickness. So it does not necessarily mean the correlation is not that strong, but at the same time a higher R^2 in this case would mean our independent variable (SIT) has a larger (more direct) effect on surface temperatures.

WhitneyTsaiNakashima commented 4 years ago

Great presentation! I like that you laid out the gap in the literature and how you are studying something different than other people. I also like that you used multiple visualizations of your data to see both temporal and spatial changes in sea ice thickness. It is interesting that you found 2005, 2008, and 2009 to be years with thicker sea ice thickness. Do you know if those years had cooler surface temperatures?

soniavsd commented 4 years ago

Great presentation! Its awesome the your data comes from a relatively under researched area like southern sea ice thickness! I was curious as what the y axis on your last graph (the spacial visualization of sea ice thickness) means and how it pertains to your dataset? It was not exactly clear in the video, but everything else was very well said and coded :)