invoice-x / invoice2data

Extract structured data from PDF invoices
MIT License
1.84k stars 482 forks source link

tests: custom: add basic cases for "options" #478

Closed rmilecki closed 1 year ago

rmilecki commented 1 year ago
This adds basic tests for "remove_accents" and "replace".
rmilecki commented 1 year ago

This resolves #467

It doesn't cover all accents cases. Especially test for handling € character will need to be added while working on the #477

bosd commented 1 year ago

Thanks for your contribution. Yet, my first look at this. The solution in this pr looks complex.

On a second thought, I think adding tests for the remove_accents is unnecessary and overly complicating things. I think we should treat the breaking of the function as an anomaly.

With the new proposal (#479) for this function to remove only the combining diacritical marks in a certain range. It is unexpected to change the strings and leave out characters. (like with the previous ascii and unidecode method)

just to share my train of thoughts.... Earlier I was more thinking about generating and unittest for functions. test_functions.py

test_remove_accents()
string_to_test  = "é€$%^&*@!.a Málaga François Phút Hơn 中文ß"
expected_result = "e€$%^&*@!.a Malaga Francois Phut Hon 中文ß"
result = remove_accents(string_to_test)
assertIsEqual( expected_result, result)

Note: for that to work, the remove_accents needs to get it's own function. Increasing function calls again. I think this is all unnecessary and unwanted. Just keep it simple :smiley:

rmilecki commented 1 year ago

Obsoleted by a better implementation in the #494