os-climate / financial-entity-cleaner

cleaning for entity matching
Apache License 2.0
4 stars 4 forks source link

Legal Term only at the end: Maybe make an optional parameter to change that #5

Open DaBeIDS opened 1 year ago

DaBeIDS commented 1 year ago

Dear all,

it would be great to have an option to replace legal terms also within the word. Maybe not as default but as an option. For example:

company_cleaner_obj = CompanyNameCleaner() company_cleaner_obj.normalize_legal_terms = True df_clean = company_cleaner_obj.get_clean_df(df_test.copy(), 'COMPANY_NAME', 'COMPANY_NAME_CLEAN')

df_test = pd.DataFrame([[999, 'baupost group llc the']], columns=['ID', 'COMPANY_NAME']) df_clean = company_cleaner_obj.get_clean_df(df_test, 'COMPANY_NAME', 'COMPANY_NAME_CLEAN')

This would not replace llc by default. Of course one could first take the "the" away and then replace the legal term but in general it might be helpful.

In case nobody takes over i can also make a proposal on the change.

Best regards,

David