sfu-db / dataprep

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
http://dataprep.ai
MIT License
1.99k stars 203 forks source link

Fix the bug and make None as NaN value #814

Closed yixuy closed 2 years ago

yixuy commented 2 years ago

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

This branch tends to fix the bug when the user input a country name as None, the library will return 'Niue' using clean_country from dataprep.clean

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

The test code as follow:

import pandas as pd
import numpy as np
from dataprep.clean import clean_country

df = pd.DataFrame({"id": [1,2,3,4,5], "country": ["United States","Kanada", "Fimland",np.nan,None]})
df = clean_country(
    df=df,
    column="country",
    input_format="auto",
    output_format="name",
    fuzzy_dist=2,
    strict=False,
    inplace=False,
    errors="coerce",
    report=True,
    progress=True
)
df

Snapshots:

Include snapshots for easier review.

image

Checklist:

codecov[bot] commented 2 years ago

Codecov Report

Merging #814 (d7ab93e) into develop (1be5fce) will increase coverage by 0.02%. The diff coverage is 83.33%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #814      +/-   ##
===========================================
+ Coverage    55.11%   55.13%   +0.02%     
===========================================
  Files          293      293              
  Lines        18855    18870      +15     
===========================================
+ Hits         10391    10404      +13     
- Misses        8464     8466       +2     
Impacted Files Coverage Δ
dataprep/clean/clean_lat_long.py 86.22% <ø> (ø)
dataprep/clean/clean_date_utils.py 76.84% <81.25%> (+0.25%) :arrow_up:
dataprep/clean/clean_country.py 93.75% <100.00%> (ø)
dataprep/eda/correlation/compute/overview.py 99.24% <0.00%> (ø)
dataprep/eda/diff/render.py 91.33% <0.00%> (+0.36%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 691257d...d7ab93e. Read the comment docs.