skfarhad / hazard_reporting_system

BSD 3-Clause "New" or "Revised" License
15 stars 16 forks source link

BE: Need to parse SMS text and extract District/Thana/Area information #40

Open skfarhad opened 2 months ago

skfarhad commented 2 months ago

The system will receive SMS for example in the following format: (District)#(Thana)#(Union/Ward/Area)#(Help seeking message)

From this SMS, need to parse Location information. People will frequently misspell the names of District/Thana/Area so need to find a reliable way to match the names of those areas from database. The names of those location entities will be available in working memory.

AlexBurski commented 2 months ago

Hello Sir, I have a few questions:

Does the SMS we receive contain additional information? (District)#(Thana)#(Union/Ward/Area)# - is an address itself? Can we use regex? How can I access location entities, if possible?

hafiz-bs23 commented 2 months ago

I will work on this. I can make a Util class that will take the string and return the location information. Few concerns though,

  1. We can manage spell check for District and thana level, but spell checking for Union/ward/Area can be hard - in terms of correct name collection and correction. Can we keep the union/ward/area as raw for MVP?
  2. Do we need Bangla support?
skfarhad commented 2 months ago

Hello Sir, I have a few questions:

Does the SMS we receive contain additional information? (District)#(Thana)#(Union/Ward/Area)# - is an address itself? Can we use regex? How can I access location entities, if possible?

Example: Feni#Parshuram#Govt Pilor High School#Need emergency medicine This is just a format that the Telco service provider will specify like which type of separator will be used ('#' in this example ) etc. The first two entity will be District, Thana. The problem is misspelling and we need a matching algo for District and thana.

skfarhad commented 2 months ago

I will work on this. I can make a Util class that will take the string and return the location information. Few concerns though,

  1. We can manage spell check for District and thana level, but spell checking for Union/ward/Area can be hard - in terms of correct name collection and correction. Can we keep the union/ward/area as raw for MVP?
  2. Do we need Bangla support?

Yes only District and Thana level will be enough for now. No Bangla Support is needed.