jlepird / next-gen-assignments

Source code for
https://af-talent-marketplace.app.cloud.gov
Other
2 stars 0 forks source link

Search capabilities for unstructured text fields #9

Open jlepird opened 7 years ago

jlepird commented 7 years ago

As a user, I need to be able to query the unstructured text fields so that I can find information I need to prioritize the officers/billets I'm interested in.

johangithub commented 7 years ago

I wasn't exactly sure what you meant when you referred to elastic search, but I think I get the idea now. Besides what we showed for 61A billets, what additional info would you show that are unstructured?

jlepird commented 7 years ago

It'd probably just be the same unstructured description. Even giving them the ability to do a keyword search would be helpful-- a really cool solution would be to have a D3-driven wordcloud that people could use to filter down on, dc.js style

johangithub commented 7 years ago

Should we do the preprocessing to have the structured data? I think most jobs would have similar attributes

jlepird commented 7 years ago

It might make sense to have a parallel data structure that contains preprocessed text-- probably stemmed, punctuation + stopwords removed, etc. You could also build a table that lists (billet, word, count) triplets. This table would make it super easy to query the most popular words of the remaining billets

jlepird commented 7 years ago

I'm building a python-based backend (https://github.com/jlepird/tm-backend) that we can use to run tasks like this. It'll have access to the same SQL database as the front-end and I'm configuring it now to be REST-ful as well