soton-data-mining / job-salary-prediction

A regression problem, predicting salaries of jobs in UK based on various criteria
8 stars 3 forks source link

job title matching (ish) #18

Closed arahayrabedian closed 7 years ago

arahayrabedian commented 7 years ago

Basically, map a job title string to a sorted, stemmed job title string and job modifier string, have not yet done one-hot/binary encoding, will need to borrow @utkuozbulak for that plus my brain is fried:

example of what we do here:

In [2]: get_stemmed_sorted_role_and_modifiers("some old granny nurse who takes care of things")
Out[2]: (['nurs'], ['care', 'old', 'take'])

In [2]: get_stemmed_sorted_role_and_modifiers("car salesperson")
Out[2]: ([], ['car'])

In [3]: get_stemmed_sorted_role_and_modifiers("car sales")
Out[3]: (['sale'], ['car'])

In [4]: get_stemmed_sorted_role_and_modifiers("car salesman")
Out[4]: ([], ['car'])

basically, we compare stemmed job titles to stemmed official jobs list and see what fits and take that, did some dirty stuff in the mapping that i'll need to go over when encoding to an ML-able format.

it's PR'ing back in to the titles branch so we can finish off and continue work there before merging to master, but i need somebody to have a look first.

also had to make some minor modifications to the oh-so-ancient official job title list, even with a fairly aggressive stemmer.