Closed alexdy2007 closed 7 years ago
Im guessing you will update this with @arahayrabedian when issues with titles are done ?
Yeah, changed most of it now with @arahayrabedian . Should hopefully finish by tomorrow.
General steps
step 1) remove stop words
step 2) remove other stuff
step 3) map title to list of jobs in archive
step 4) map extra info to modifier vector list
step 5) make it binary classification
@utkuozbulak : not loading into pandas as reading into a list to remove stop words and other preprocessing things. planning on then sticking it into a vector:
"@utkuozbulak : not loading into pandas as reading into a list to remove stop words and other preprocessing things. planning on then sticking it into a vector"
data = pd.read_csv('file') # Already in main
specific_feature = data[['column_name']]
feature_as_list = pandas_vector_to_list(specific_feature )
def pandas_vector_to_list(pandas_df): # Already in cleaning functions
py_list = [item[0] for item in pandas_df.values.tolist()]
return py_list
This is super simple instead of reading manually, no ?
I don't mind either way, manual or not,
this is all contained as part of #19 , closing, should have just used this to be honest. ma bad.
Began analysing titles and abstracted data_extractors to separate class.
Note: changed list of cities to include all with population above 1000 not 50000 as alot of towns in titles still persisted.