hotosm / galaxy-api

Backend to fetch data from Underpass
https://galaxy-api.hotosm.org/latest/redoc
GNU Affero General Public License v3.0
14 stars 5 forks source link

Making our endpoints case insensitive for hashtags #122

Open ramyaragupathy opened 2 years ago

ramyaragupathy commented 2 years ago

At present we have the case insensitivity implemented for hashtag statistics endpoint. Following endpoints which use hashtag as input parameter should behave similar while dealing with hashtags:

kshitijrajsharma commented 2 years ago

By looking at current scenario , IMO we should follow same consistency about how we treat hashtag over all endpoints , Like this https://github.com/hotosm/insights/blob/013c73d27d5aeaab8c7b9152d5c8543e0bd229f6/hashtags.py#L69 Currently Insight and underpass both are storing hashtag in lower case and with # sign only ! Consistency should be same across all components of galaxy , we need to change query for those existing endpoints and use same validation such as stripping spaces , single character , special character cc : @ramyaragupathy @omranlm @JorgeMartinezG @robsavoye

robsavoye commented 2 years ago

If a comment has a #, then it's a hashtag, which by "definition" is terminated by a space. Sometimes hashtags are embedded with other non hashtag text in the comment field. This got easier when the hashtag tag was added. I've also seen a lot of random capitalization in hashtag values, so converting everything to lower case allows for pattern matching. Sometimes there also special characters (typos), like a single or double quote mark, which can break parsing.