Open ramyaragupathy opened 2 years ago
By looking at current scenario , IMO we should follow same consistency about how we treat hashtag over all endpoints , Like this https://github.com/hotosm/insights/blob/013c73d27d5aeaab8c7b9152d5c8543e0bd229f6/hashtags.py#L69 Currently Insight and underpass both are storing hashtag in lower case and with # sign only ! Consistency should be same across all components of galaxy , we need to change query for those existing endpoints and use same validation such as stripping spaces , single character , special character cc : @ramyaragupathy @omranlm @JorgeMartinezG @robsavoye
If a comment has a #, then it's a hashtag, which by "definition" is terminated by a space. Sometimes hashtags are embedded with other non hashtag text in the comment field. This got easier when the hashtag tag was added. I've also seen a lot of random capitalization in hashtag values, so converting everything to lower case allows for pattern matching. Sometimes there also special characters (typos), like a single or double quote mark, which can break parsing.
At present we have the case insensitivity implemented for hashtag statistics endpoint. Following endpoints which use hashtag as input parameter should behave similar while dealing with hashtags: