cuducos commented 5 years ago

Many thanks, @MiniMarvin : )

I was wondering (but haven't had time to test it yet): wouldn't your algorithm replace AGENTE DE VIAGEM E GUIA DE TURISMO by AGENTE DE VIAGEM E GUIA DE TURISMA?

I shared the list of profession because in my mind it would be safer (even if more repetitive) to work with a dictionary of female version than to play around with string replacements…

Now I'm afraid that an algorithm that plays with these replacements needs some unit tests with edge cases. But I might be overwhelmed so I really would like to listen your opinion on that.

MiniMarvin commented 5 years ago

Thanks for the review, @cuducos I think that build some unit tests may be really a good idea, yet, the algorithm was made thinking in the database you posted, the algorithm works with some tokens([allowed_alteration] [lock_iteration]) running in every word, it is, in the case you posted it will make like so:

AGENTE DE VIAGEM E GUIA DE TURISMO AGENTE -> [allowed_alteration = true] [lock_iteration = false] won't update DE -> [allowed_alteration = false] [lock_iteration = true, because found the word "de"] won't change ... (run without any change until finding a new noum, in the dataset you posted only occurs after the word "e") GUIA -> [allowed_alteration = true] [lock_iteration = false] won't update (now it goes the allowed_alteration to false and no longer will change any word)

Yet I think the unit tests are a pretty good idea, but with algorithms, because it will allow any expansion that may occur and reusability in any part of the system without manual work and the texts that may come outside of the cases won't be changed because this algorithm is designed to work with regular gender flection in Portuguese language and ignore any irregular change.

cuducos commented 5 years ago

Ok, it makes sense: reading your explanation together with the algorithm made things clearer! Many thanks once more.

@anarute, as you opened #6, would you like to take a look in this approach before I merge? It partially addresses the gender issue you highlighted.

anarute commented 5 years ago

I would take a different approach too and have a dict for each gender it would be simpler, but I agree that this works too. I like that it doesn't rely on data changes, but for expansion I'm not sure if this would be so reusable as you mentioned - think about new genders and languages, it would become an ifs and elses mess and opened to a lot of corner cases. I agree that this solves for now and we can iterate it with time ;) Thank you both for addressing this so quickly.

cuducos commented 5 years ago

@MiniMarvin, sorry to bother you in this closed/merged PR, but just out of curiosity: I've found a bug in this algorithm, take a look on Andrea Campos Sales, Deputada Estadual (SP):

Does it worth it to isolate the function, run it against all occupation listed in #6 to find all edge cases like that? Or do you feel like this an isolates issue.

cuducos commented 5 years ago

BTW this is the list filtering only by occupations occupied by people identified as females:

List of occupations

``` ADMINISTRADOR ADVOGADO AGENTE ADMINISTRATIVO AGENTE DE SAUDE E SANITARISTA AGENTE DE SERVICOS FUNERARIOS E EMBALSAMADOR AGENTE DE VIAGEM AGENTE POSTAL AGRICULTOR AGRONOMO ALFAIATE E COSTUREIRO ALMOXARIFE ANALISTA DE SISTEMAS ANTROPOLOGO APOSENTADO (EXCETO SERVIDOR PUBLICO) ARQUITETO ARQUIVISTA E MUSEOLOGO ARTESAO ARTISTA PLASTICO E ASSEMELHADOS ASSISTENTE SOCIAL ASTROLOGO ASTRONOMO ATENDENTE DE LANCHONETE E RESTAURANTE ATLETA PROFISSIONAL E TECNICO EM DESPORTOS ATOR E DIRETOR DE ESPETACULOS PUBLICOS AUXILIAR DE ESCRITORIO E ASSEMELHADOS AUXILIAR DE LABORATORIO BANCARIO E ECONOMIARIO BIBLIOTECARIO BIOLOGO BIOMEDICO BOMBEIRO CIVIL BOMBEIRO MILITAR CABELEIREIRO E BARBEIRO CANTOR E COMPOSITOR CATADOR DE RECICLAVEIS CERAMISTA E OLEIRO CHAVEIRO CIENTISTA POLITICO COBRADOR DE TRANSPORTE COLETIVO COMERCIANTE COMERCIARIO COMISSARIO DE BORDO COMUNICOLOGO CONTADOR COREOGRAFO E BAILARINO CORRETOR DE IMOVEIS, SEGUROS, TITULOS E VALORES COZINHEIRO DECORADOR DEPUTADO DESENHISTA DESPACHANTE DETETIVE PARTICULAR DIRETOR DE EMPRESAS DIRETOR DE ESTABELECIMENTO DE ENSINO DONA DE CASA ECONOMISTA ECONOMISTA DOMESTICO ELETRICISTA E ASSEMELHADOS EMPREGADO DOMESTICO EMPRESARIO ENFERMEIRO ENGENHEIRO ENGRAXATE ESCRITOR E CRITICO ESCULTOR E PINTOR ESTATISTICO ESTETICISTA ESTUDANTE, BOLSISTA, ESTAGIARIO E ASSEMELHADOS FARMACEUTICO FAXINEIRO FEIRANTE, AMBULANTE E MASCATE FERROVIARIO FIANDEIRO, TECELAO, TINGIDOR E ASSEMELHADOS FISCAL FISIOTERAPEUTA E TERAPEUTA OCUPACIONAL FONOAUDIOLOGO FOTOGRAFO E ASSEMELHADOS FRENTISTA GARCOM GARI OU LIXEIRO GEOGRAFO GERENTE GOVERNADOR GOVERNANTA GUIA DE TURISMO HISTORIADOR INDUSTRIAL JARDINEIRO JORNALEIRO JORNALISTA E REDATOR LOCUTOR E COMENTARISTA DE RADIO E TELEVISAO E RADIALISTA MAGISTRADO MANICURE E MAQUILADOR MASSAGISTA MATEMATICO E ATUARIO MEDICO MEMBRO DAS FORCAS ARMADAS MEMBRO DO MINISTERIO PUBLICO MILITAR REFORMADO MODELO MOTOBOY MOTORISTA DE VEICULOS DE TRANSPORTE COLETIVO DE PASSAGEIROS MOTORISTA DE VEICULOS DE TRANSPORTE DE CARGA MOTORISTA PARTICULAR MUSICO NUTRICIONISTA E ASSEMELHADOS OCEANOGRAFO OCUPANTE DE CARGO EM COMISSAO ODONTOLOGO OPERADOR DE COMPUTADOR OPERADOR DE EQUIPAMENTO MEDICO E ODONTOLOGICO OPERADOR DE INSTALACAO DE PRODUCAO DE ENERGIA ELETRICA E NUCLEAR OUTROS PADEIRO, CONFEITEIRO E ASSEMELHADOS PARAMEDICO PECUARISTA PEDAGOGO PESCADOR PETROLEIRO POLICIAL CIVIL POLICIAL MILITAR PREFEITO PRODUTOR AGROPECUARIO PRODUTOR DE ESPETACULOS PUBLICOS PROFESSOR DE ENSINO FUNDAMENTAL PROFESSOR DE ENSINO MEDIO PROFESSOR DE ENSINO SUPERIOR PROFESSOR E INSTRUTOR DE FORMACAO PROFISSIONAL PSICOLOGO PUBLICITARIO QUIMICO RECEPCIONISTA RELACOES-PUBLICAS RELOJOEIRO E MONTADOR DE INSTRUMENTO DE PRECISAO REPRESENTANTE COMERCIAL SACERDOTE OU MEMBRO DE ORDEM OU SEITA RELIGIOSA SECRETARIO E DATILOGRAFO SECURITARIO SENADOR SERVENTUARIO DE JUSTICA SERVIDOR PUBLICO CIVIL APOSENTADO SERVIDOR PUBLICO ESTADUAL SERVIDOR PUBLICO FEDERAL SERVIDOR PUBLICO MUNICIPAL SOCIOLOGO SUPERVISOR, INSPETOR E AGENTE DE COMPRAS E VENDAS TAXISTA TECNICO CONTABILIDADE, ESTATISTICA, ECONOMIA DOMESTICA E ADMINISTRACAO TECNICO DE BIOLOGIA TECNICO DE ELETRICIDADE, ELETRONICA E TELECOMUNICACOES TECNICO DE ENFERMAGEM E ASSEMELHADOS (EXCETO ENFERMEIRO) TECNICO DE LABORATORIO E RAIOS X TECNICO DE MECANICA TECNICO DE OBRAS CIVIS, ESTRADAS, SANEAMENTO E ASSEMELHADOS TECNICO EM AGRONOMIA E AGRIMENSURA TECNICO EM EDIFICACOES TECNICO EM INFORMATICA TELEFONISTA TERAPEUTA TRABALHADOR DE FABRICACAO DE ROUPAS TRABALHADOR DE HOTELARIA TRABALHADOR DOS SERVICOS DE CONTABILIDADE, DE CAIXA E ASSEMELHADOS TRABALHADOR FLORESTAL TRABALHADOR METALURGICO E SIDERURGICO TRABALHADOR RURAL TRADUTOR, INTERPRETE E FILOLOGO VENDEDOR DE COMERCIO VAREJISTA E ATACADISTA VENDEDOR PRACISTA, REPRESENTANTE, CAIXEIRO-VIAJANTE E ASSEMELHADOS VEREADOR VETERINARIO VIGILANTE ZOOTECNISTA ```

MiniMarvin commented 5 years ago

@cuducos I see that edge case passed by I will build a unit test for this function to avoid any edge case present in the dataset and build a dict of edge cases, thanks for the review!

cuducos commented 5 years ago

26 adopted a dictionary, which is safe until next election (when new terms might be added to this dictionary) so I think that's it by now : )

okfn-brasil / perfil-politico-frontend

Now every single female has it's profession properly with gender flection #14

26 adopted a dictionary, which is safe until next election (when new terms might be added to this dictionary) so I think that's it by now : )