stefmolin / Hands-On-Data-Analysis-with-Pandas-2nd-edition

Materials for following along with Hands-On Data Analysis with Pandas – Second Edition
https://www.amazon.com/Hands-Data-Analysis-Pandas-visualization/dp/1800563450
MIT License
577 stars 1.38k forks source link

Solution to Exercise 10.d of Chapter 4 #23

Closed apaksoy closed 2 years ago

apaksoy commented 2 years ago

Is it possible that the solution given under "Exercise 10, Part 4" in the respective JupyterLab notebook to Exercise 10.d in Chapter 4 of the book (page 259 of the PDF version) is incorrect?

The particular exercise says "Find the first date that each country other than China had cases." Yet the given solution seems to provide the first date where the number of cases is at a minimum, not the first date where the number of cases is greater than zero, for each country.

I think the following code, for example, provides a correct solution for this exercise rather than the one in the respective notebook:

# Code
pd.to_datetime(
    covid[(covid.cases > 0) & 
        (covid.countriesAndTerritories != "China")]\
        .groupby("countriesAndTerritories").first().dateRep,
    dayfirst=True
    ).sort_values()
# Output
countriesAndTerritories
Thailand         2020-01-13
Japan            2020-01-15
South_Korea      2020-01-20
Taiwan           2020-01-21
USA              2020-01-21
                    ...    
Yemen            2020-04-10
Western_Sahara   2020-04-26
Tajikistan       2020-05-01
Comoros          2020-05-02
Lesotho          2020-05-15
Name: dateRep, Length: 209, dtype: datetime64[ns]

Required attestation


Background information

1. Which OS are you using?

macOS Monterey (Intel)

2. Which Python version are you using?

Python 3.9.13

3. Are you using conda or venv?

venv

4. Package versions
versions installed ``` anyio==3.6.1 appnope==0.1.3 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 asttokens==2.0.5 attrs==21.4.0 Babel==2.10.3 backcall==0.2.0 beautifulsoup4==4.11.1 bleach==5.0.1 certifi==2022.6.15 cffi==1.15.1 chardet==3.0.4 cycler==0.11.0 debugpy==1.6.2 decorator==5.1.1 defusedxml==0.7.1 entrypoints==0.4 executing==0.9.1 fastjsonschema==2.16.1 graphviz==0.14.1 idna==2.10 imbalanced-learn==0.7.0 importlib-metadata==4.12.0 ipykernel==6.15.1 ipympl==0.6.2 ipython==8.4.0 ipython-genutils==0.2.0 ipywidgets==7.7.1 jedi==0.18.1 Jinja2==3.1.2 joblib==1.1.0 json5==0.9.8 jsonschema==4.7.2 jupyter-client==7.3.4 jupyter-core==4.11.1 jupyter-server==1.18.1 jupyterlab==3.0.4 jupyterlab-pygments==0.2.2 jupyterlab-server==2.15.0 jupyterlab-widgets==1.1.1 kiwisolver==1.4.4 login-attempt-simulator==0.2 lxml==4.9.1 MarkupSafe==2.1.1 matplotlib==3.3.2 matplotlib-inline==0.1.3 mistune==0.8.4 ml-utils==0.2.0 mplfinance==0.12.9b1 nbclassic==0.4.3 nbclient==0.6.6 nbconvert==6.5.0 nbformat==5.4.0 nest-asyncio==1.5.5 notebook==6.4.12 notebook-shim==0.1.0 numpy==1.19.4 packaging==21.3 pandas==1.2.0 pandas-datareader==0.10.0 pandocfilters==1.5.0 parso==0.8.3 patsy==0.5.2 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.2.0 prometheus-client==0.14.1 prompt-toolkit==3.0.30 psutil==5.9.1 ptyprocess==0.7.0 pure-eval==0.2.2 pycparser==2.21 Pygments==2.12.0 pyparsing==3.0.9 pyrsistent==0.18.1 python-dateutil==2.8.2 pytz==2022.1 pyzmq==23.2.0 requests==2.24.0 scikit-learn==0.23.2 scipy==1.8.1 seaborn==0.11.0 Send2Trash==1.8.0 six==1.16.0 sniffio==1.2.0 soupsieve==2.3.2.post1 SQLAlchemy==1.3.20 stack-data==0.3.0 statsmodels==0.12.1 stock-analysis==0.2 terminado==0.15.0 threadpoolctl==3.1.0 tinycss2==1.1.1 tornado==6.2 traitlets==5.3.0 urllib3==1.25.11 visual-aids==2.0 wcwidth==0.2.5 webencodings==0.5.1 websocket-client==1.3.3 widgetsnbextension==3.6.1 zipp==3.8.1 ```
5. Run the ch_01/checking_your_setup.ipynb notebook

Screenshot after running the ch_01/checking_your_setup.ipynb notebook:

image

Commands run and their outputs

Please provide all of the commands you ran as well as the traceback:

``` # Code covid.reset_index()\ .pivot(index='date', columns='countriesAndTerritories', values='cases')\ .drop(columns='China')\ .fillna(0)\ .apply(lambda x: x[(x > 0)].idxmin())\ .sort_values()\ .rename(lambda x: x.replace('_', ' ')) ``` ``` # Output countriesAndTerritories Thailand 2020-01-13 Japan 2020-01-15 South Korea 2020-01-20 USA 2020-01-21 Taiwan 2020-01-21 ... Lesotho 2020-05-15 Uruguay 2020-05-17 Western Sahara 2020-06-20 Mali 2020-07-07 Puerto Rico 2020-09-10 Length: 209, dtype: datetime64[ns] ```

Screenshots

Optionally, include any screenshots that will help diagnose the issue.

stefmolin commented 2 years ago

Hi there. Good catch! idxmin() should be index.min() – your result is correct. I'll update the notebook later today.