swcarpentry / python-novice-gapminder

Plotting and Programming in Python
http://swcarpentry.github.io/python-novice-gapminder/
Other
163 stars 427 forks source link

Ways of accessing DataFrame elements, rows, columns, and subsets #532

Closed maxim-belkin closed 3 years ago

maxim-belkin commented 3 years ago

This isn't really an issue, but rather me trying to summarize ways one can access various subsets of a DataFrame in Pandas. Perhaps, it would be of benefit to someone or inspire someone to improve the lesson... :)))

  1. Access a single column

    # by name
    data["col_name"]   # as a Series
    data[["col_name"]] # as a DataFrame
    
    # by name using .loc
    data.T.loc["col_name"]  # as a Series
    data.T.loc[["col_name"]].T  # as a DataFrame
    
    # Dot notation (Series)
    data.col_name
    
    # by index (iloc)
    data.iloc[:, col_index]   # as a Series
    data.iloc[:, [col_index]] # as a DataFrame
    
    # using a mask
    data.T[data.T.index == "col_name"].T
  2. Access a single row

    # by name using .loc
    data.loc["row_name"] # as a Series
    data.loc[["row_name"]] # as a DataFrame
    
    # by name
    data.T["row_name"] # as a Series
    data.T[["row_name"]].T as a DataFrame
    
    # by index
    data.iloc[row_index]   # as a Series
    data.iloc[[row_index]]   # as a DataFrame
    
    # using mask
    data[data.index == "row_name"]
  3. Access an individual DataFrame element

    # by column/row names
    data["column_name"]["row_name"]         # as a Series
    
    data[["col_name"]].loc["row_name"]  # as a Series
    data[["col_name"]].loc[["row_name"]]  # as a DataFrame
    
    data.loc["row_name"]["col_name"]  # as a value
    data.loc[["row_name"]]["col_name"]  # as a Series
    data.loc[["row_name"]][["col_name"]]  # as a DataFrame
    
    data.loc["row_name", "col_name"]  # as a value
    data.loc[["row_name"], "col_name"]  # as a Series. Preserves index. Column name is moved to `.name`.
    data.loc["row_name", ["col_name"]]  # as a Series. Index is moved to `.name.` Sets index to column name.
    data.loc[["row_name"], ["col_name"]]  # as a DataFrame (preserves original index and column name)
    
    # by column/row names: Dot notation
    data.col_name.row_name
    
    # by column/row indices
    data.iloc[row_index, col_index] # as a value
    data.iloc[[row_index], col_index] # as a Series. Preserves index. Column name is moved to `.name` 
    data.iloc[row_index, [col_index]] # as a Series. Index is moved to `.name.` Sets index to column name. 
    data.iloc[[row_index], [col_index]] # as a DataFrame (preserves original index and column name)
    
    # column name + row index
    data["col_name"][row_index]
    data.col_name[row_index]
    data["col_name"].iloc[row_index]
    
    # column index + row name
    data.iloc[:, [col_index]].loc["row_name"]  # as a Series
    data.iloc[:, [col_index]].loc[["row_name"]]  # as a DataFrame
    
    # using masks
    data[data.index == "row_name"].T[data.T.index == "col_name"].T
  4. Access several columns

    # by name
    data[["col1", "col2", "col3"]]
    data.loc[:, ["col1", "col2", "col3"]]
    
    # by index
    data.iloc[:, [col1_index, col2_index, col3_index]]
  5. Access several rows

    # by name
    data.loc[["row1", "row2", "row3"]]
    
    # by index
    data.iloc[[row1_index, row2_index, row3_index]]
  6. Access a subset of specific rows and columns

    # by names
    data.loc[["row1", "row2", "row3"], ["col1", "col2", "col3"]]
    
    # by indices
    data.iloc[[row1_index, row2_index, row3_index], [col1_index, col2_index, col3_index]]
    
    # column names + row indices
    data[["col1", "col2", "col3"]].iloc[[row1_index, row2_index, row3_index]]
    
    # column indices + row names
    data.iloc[:, [col1_index, col2_index, col3_index]].loc[["row1", "row2", "row3"]]
  7. Access a subset of row and column ranges

    # by name
    data.loc["row1":"row2", "col1":"col2"]
    
    # by index
    data.iloc[row1_index:row2_index, col1_index:col2_index]
    
    # column names + row indices
    data.loc[:, "col1_name":"col2_name"].iloc[row1_index:row2_index]
    
    # column indices + row names
    data.iloc[:, col1_index:col2_index].loc["row1":"row2"]
vahtras commented 3 years ago

Thanks, these are useful notes!