Closed MiniMarvin closed 5 years ago
Thanks for the review, @cuducos I think that build some unit tests may be really a good idea, yet, the algorithm was made thinking in the database you posted, the algorithm works with some tokens([allowed_alteration] [lock_iteration]) running in every word, it is, in the case you posted it will make like so:
AGENTE DE VIAGEM E GUIA DE TURISMO AGENTE -> [allowed_alteration = true] [lock_iteration = false] won't update DE -> [allowed_alteration = false] [lock_iteration = true, because found the word "de"] won't change ... (run without any change until finding a new noum, in the dataset you posted only occurs after the word "e") GUIA -> [allowed_alteration = true] [lock_iteration = false] won't update (now it goes the allowed_alteration to false and no longer will change any word)
Yet I think the unit tests are a pretty good idea, but with algorithms, because it will allow any expansion that may occur and reusability in any part of the system without manual work and the texts that may come outside of the cases won't be changed because this algorithm is designed to work with regular gender flection in Portuguese language and ignore any irregular change.
Ok, it makes sense: reading your explanation together with the algorithm made things clearer! Many thanks once more.
@anarute, as you opened #6, would you like to take a look in this approach before I merge? It partially addresses the gender issue you highlighted.
I would take a different approach too and have a dict for each gender it would be simpler, but I agree that this works too.
I like that it doesn't rely on data changes, but for expansion I'm not sure if this would be so reusable as you mentioned - think about new genders and languages, it would become an ifs
and elses
mess and opened to a lot of corner cases.
I agree that this solves for now and we can iterate it with time ;)
Thank you both for addressing this so quickly.
@MiniMarvin, sorry to bother you in this closed/merged PR, but just out of curiosity: I've found a bug in this algorithm, take a look on Andrea Campos Sales, Deputada Estadual (SP):
Does it worth it to isolate the function, run it against all occupation listed in #6 to find all edge cases like that? Or do you feel like this an isolates issue.
BTW this is the list filtering only by occupations occupied by people identified as females:
@cuducos I see that edge case passed by I will build a unit test for this function to avoid any edge case present in the dataset and build a dict of edge cases, thanks for the review!
Many thanks, @MiniMarvin : )
I was wondering (but haven't had time to test it yet): wouldn't your algorithm replace
AGENTE DE VIAGEM E GUIA DE TURISMO
byAGENTE DE VIAGEM E GUIA DE TURISMA
?I shared the list of profession because in my mind it would be safer (even if more repetitive) to work with a dictionary of female version than to play around with string replacements…
Now I'm afraid that an algorithm that plays with these replacements needs some unit tests with edge cases. But I might be overwhelmed so I really would like to listen your opinion on that.