Open chase-dwelle opened 7 years ago
Based on Jimmy's work with NLTK on the status messages, we have a list of keywords that correspond to different failure modes: https://www.lucidchart.com/documents/edit/26a13991-a3a9-4fb2-8572-16b497b7e191?shared=true&
Environmental drivers: {'Reduced water table', 'lowered water table','drought', 'dry', 'dried', 'low yield', 'low flow', 'poor retention','water shortage','source', 'lack','dry season','jerican','jerry can', 'shallow','climatic','insufficient', 'quantity:insufficient'}
Pollution: {'Salty', 'poorly sited', 'millky', 'coloured', 'contaminated', 'odour', 'smell', 'muddy', 'black', 'poor', 'dirty', 'silt', 'soil'}
Potential human causes: {'Committee', 'WSC', 'fuel', 'theft', 'vandalised', 'stolen', 'beneficiaries', 'pay',' paid', 'funds', 'bill', 'people', 'personnel'}
Mechanical causes: {'Pump', 'handle', 'pipes', 'tank', 'construction', 'cylinder', 'apron', 'repair', 'parts', 'installation', 'broken', 'blocked', 'technical'}
Some words used to tag mechanical failures, (e.g. 'construction'), are applied to wells that are in fact working (e.g. 'STATUS' = 'Functional ( in use)|New Under construction').
Consider using bigrams? Or removing 'FUNC' = 'Yes' entries from consideration for mechanical failures?
I think it is fine to process them for now (if we have a MECH_FAIL
column, have entries even if FUNC
is yes). We can exclude the FUNC = Yes
entries when we do failure analysis, then maybe next year's group can
worry about cleaning up our data a little bit :)
In addition to the well functional binary (YES/NO), we also have status messages, e.g.,
So we need to figure out some of these keywords in order to make better categories of well failure conditions.