the-magnificents / 04-02-2021-Carpentry-for-HGIS

A carpentry workshop focused on Digital Humanities audience that works with Geospatial Data.
Other
2 stars 3 forks source link

04-02-2021-Carpentry-for-HGIS/02_Day_2_Python_GIS/exercise/B2_Exercise #72

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Pandas dataframes — Python essentials for GIS learners

https://the-magnificents.github.io/04-02-2021-Carpentry-for-HGIS/02_Day_2_Python_GIS/exercise/B2_Exercise.html

ThoTUM86 commented 3 years ago

For: "data.loc['Denmark',:]" --> why do you need "a ,:" ? I get the same results without...

For: "data['gdpPercap_2007']/data['gdpPercap_1952']" --> How does the program know that he should calculate this for all countries and not just one row?

For: "data['gdpPercap_2007']/data['gdpPercap_1952']" --> Why not use the ".loc" ?

There are two typos in this code: my_string = 'Hello world!' # creation of a string object dir(myString) --> dir(my_string)

ThoTUM86 commented 3 years ago

Python includes a dir() function that can be used to display all of the available methods (functions) that are built into a data object. --> As it states "data object" where is this list depending on or is it universal?

aecryan commented 3 years ago

@thoTUM86 good questions

  1. you can execute the data.loc['Denmark'] command without a colon, but I think this exercise is including it to get you used to how indexing works. Basically with the colon it makes it extra clear that you are getting only Denmark data, all columns (years).
  2. This works because of pandas data structures - essentially you are dividing one whole column by another. More accurately, you are dividing each value in gdpPercap_2007 by the corresponding value in gdpPercap_1952. On their own, these columns are both "series" objects - together with the rest of the data, they are part of a dataframe. Therefore, you can write equations like this that basically operate like a for loop (without having to write one).
  3. You are right! In order to run that correctly you would need to run dir(my_string).
  4. dir() works on any data object, and depending on what that object is and how it was created, it will have different functions or methods available. So, it is not a universal list, and if you write modules on your own you can even specify which methods and functions should be available to work with objects of a certain class. Running dir() will show you what is available to you for that particular object of interest.
cforgaci commented 3 years ago

I just wrote a comment here and it disappeared.

aecryan commented 3 years ago

@cforgaci hmmm that is really strange! I don't see it anywhere, sorry that happened!

cforgaci commented 3 years ago

@aecryan, no problem, but it's too long to reproduce. This is a good reminder to write answers in a separate markdown editor :)

aecryan commented 3 years ago

Hahaha I fully respect that. And, yes better safe than sorry!

jurra commented 3 years ago

Trying again @cforgaci comment, that appears in github issues but not in the website: Exercise: Selection of Individual Values

The GDP per capita of Serbia in 2007 was:
gdpPercap_2007    9786.534714
Name: Serbia, dtype: float64

Exercise: Extent of Slicing

  1. not the same output
  2. named slicing seems to be inclusive of the end item, whereas slicing with indices excludes the last item

Exercise: Reconstructing Data

Exercise: Selecting Indices

Note,myString should be my_string in the following code:

my_string = 'Hello world!'   # creation of a string object 
dir(myString)
cforgaci commented 3 years ago

@jurra, thanks, now I can see it!

jurra commented 3 years ago

I post here my answer: Here goes a version of the function where we can print only the gdp number:

def get_gdp(country, year):
    gdp = data.loc[country, years == year]
    print("The GDP per capita of ", country, " in ", year, " was:\n", gdp[0], sep='')

This specifies to get only the value in the series, this selection would only have one item, being at position 0.