sfu-db / dataprep

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
http://dataprep.ai
MIT License
1.97k stars 201 forks source link

Country name cleaning failed example #939

Open yibenhuang opened 1 year ago

yibenhuang commented 1 year ago

Describe the bug Hi, just found the country name "Virgin Islands (British)" would be failed to clean to the correct name.

To Reproduce

import pandas as pd
from dataprep.clean import clean_country

df = pd.DataFrame({"country": ["Virgin Islands (British)", "Virgin Islands (U.S.)"]})
clean_country(df, column="country", output_format="name")
Output: country country_clean
0 Virgin Islands (British) NaN
1 Virgin Islands (U.S.) United States Virgin Islands

Expected behavior The based on project country_converter can work like below.

import country_converter as coco

names = ["Virgin Islands (British)", "Virgin Islands (U.S.)"]
cc = coco.CountryConverter()

cc.convert(names=names, to="name_short")
# Output: ['British Virgin Islands', 'United States Virgin Islands']

Desktop (please complete the following information):