utterances-bot commented 3 years ago

Pandas dataframes — Python essentials for GIS learners

https://the-magnificents.github.io/04-02-2021-Carpentry-for-HGIS/02_Day_2_Python_GIS/exercise/B2_Exercise.html

ThoTUM86 commented 3 years ago

For: "data.loc['Denmark',:]" --> why do you need "a ,:" ? I get the same results without...

For: "data['gdpPercap_2007']/data['gdpPercap_1952']" --> How does the program know that he should calculate this for all countries and not just one row?

For: "data['gdpPercap_2007']/data['gdpPercap_1952']" --> Why not use the ".loc" ?

There are two typos in this code: my_string = 'Hello world!' # creation of a string object dir(myString) --> dir(my_string)

ThoTUM86 commented 3 years ago

Python includes a dir() function that can be used to display all of the available methods (functions) that are built into a data object. --> As it states "data object" where is this list depending on or is it universal?

aecryan commented 3 years ago

@thoTUM86 good questions

you can execute the data.loc['Denmark'] command without a colon, but I think this exercise is including it to get you used to how indexing works. Basically with the colon it makes it extra clear that you are getting only Denmark data, all columns (years).
This works because of pandas data structures - essentially you are dividing one whole column by another. More accurately, you are dividing each value in gdpPercap_2007 by the corresponding value in gdpPercap_1952. On their own, these columns are both "series" objects - together with the rest of the data, they are part of a dataframe. Therefore, you can write equations like this that basically operate like a for loop (without having to write one).
You are right! In order to run that correctly you would need to run dir(my_string).
dir() works on any data object, and depending on what that object is and how it was created, it will have different functions or methods available. So, it is not a universal list, and if you write modules on your own you can even specify which methods and functions should be available to work with objects of a certain class. Running dir() will show you what is available to you for that particular object of interest.

cforgaci commented 3 years ago

I just wrote a comment here and it disappeared.

aecryan commented 3 years ago

@cforgaci hmmm that is really strange! I don't see it anywhere, sorry that happened!

cforgaci commented 3 years ago

@aecryan, no problem, but it's too long to reproduce. This is a good reminder to write answers in a separate markdown editor :)

aecryan commented 3 years ago

Hahaha I fully respect that. And, yes better safe than sorry!

jurra commented 3 years ago

Trying again @cforgaci comment, that appears in github issues but not in the website: Exercise: Selection of Individual Values

v1:
```
data.loc['Serbia','gdpPercap_2007']
```

v2 (assuming that we would want to write a maybe-not-so-useful function get_gdp()):

years = data.columns.str.strip('gdpPercap_').astype(int)
def get_gdp(country, year):
print("The GDP per capita of ", country, " in ", year, " was:\n", data.loc[country, years == year], sep='')
get_gdp("Serbia", 2007)
get_gdp("Hungary", 2007)

I don't know how to simplify the output of v2 to only show the year. Now it shows this:

The GDP per capita of Serbia in 2007 was:
gdpPercap_2007    9786.534714
Name: Serbia, dtype: float64

Exercise: Extent of Slicing

not the same output
named slicing seems to be inclusive of the end item, whereas slicing with indices excludes the last item

Exercise: Reconstructing Data

first: read csv file
second: subset data frame only with 'Americas'
third: remove the row with Puerto Rico
fourth: remove the 'continent' column
write result to result.csv

Exercise: Selecting Indices

idxmin()/idxmax() displays the rowname (country) for the miminum / maximum values for each column, respectively. I noticed that the difference from min()/max() is that those display the value of minimum/maximum GDP, without specifyig the country, so idxmin()/idxmax() are more helpful in this sense.

Note,myString should be my_string in the following code:

my_string = 'Hello world!'   # creation of a string object 
dir(myString)

cforgaci commented 3 years ago

@jurra, thanks, now I can see it!

jurra commented 3 years ago

I post here my answer: Here goes a version of the function where we can print only the gdp number:

def get_gdp(country, year):
    gdp = data.loc[country, years == year]
    print("The GDP per capita of ", country, " in ", year, " was:\n", gdp[0], sep='')

This specifies to get only the value in the series, this selection would only have one item, being at position 0.

the-magnificents / 04-02-2021-Carpentry-for-HGIS

04-02-2021-Carpentry-for-HGIS/02_Day_2_Python_GIS/exercise/B2_Exercise #72

Pandas dataframes — Python essentials for GIS learners