Closed swoertz closed 11 years ago
Don't have the bandwidth to implement this feature. I'm leaving it open. Any community member who is willing to implement this feature should let me know. I'll humbly merge any pull request that comes my way on that front.
I am playing with implementing this at the moment. My thoughts are:
I propose an alternative, whereby entering regex(YOUR_REGEX_HERE)
in the keywords field would cause the filtering to be processed via regular expressions rather than comma-separated keywords.
What are your thoughts on this? I will submit a pull request once I have this implemented and tested, so you can decide whether it's acceptable for inclusion! :)
Notes:
What I've got so far. All changes are within ProcessSMS.java
, and there's nothing complicated.
Introduced dependency on java.utils.regex:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
Changed name + function call:
Replaced...
String keywords[] = syncUrl.getKeywords().split(",");
if (filterByKeywords(messagesBody, keywords)) {
... with ...
String filterText = syncUrl.getKeywords();
if (matchesFilterText(messagesBody, filterText)) {
Then, changed the function itself by replacing...
public boolean filterByKeywords(String message, String[] keywords) {
... with ...
public boolean matchesFilterText(String message, String filterText) {
// Handle Regular Expression, if specified by "regex(XYZ)".
if (filterText.toLowerCase().startsWith("regex(") && filterText.toLowerCase().endsWith(")")) {
String regEx = filterText.substring(6, filterText.length()-1);
Pattern p = Pattern.compile(regEx); // TODO - test RegEx validity here?
Matcher m = p.matcher(message);
if (m.find()) {
return true;
}
return false;
}
// Handle the filter text as comma-separated keywords
String[] keywords = filterText.split(",");
Feel free to take this, as I don't have more time at the moment - it should work but it's not yet properly tested.
I would suggest to make a checkbox (regular expression - true, false) instead of implementing it as a hidden feature. Persons who are not familiar either not check the box or look up what it means. All the others can simple use the feature. Further I would also suggest to make a checkbox for case-sensitivity.
Okay - thanks for sharing your thoughts. I agree this would be a nice addition, but my concerns remain about making the UI too cluttered, and adding complexity.
Have a look at a mock-up of these new options:
I think you'll agree, it's getting a bit complicated, and there's no longer a nice way to give instructions on comma-separated keywords (currently, the keywords field says: "Enter keyword Eg. Fire, Crime, Help", which is clear and consise).
I'm thinking of a more useful position for the extra options. I think, at least, this feature would require a new full-page screen, that could be used when creating or editing sync URLs.
In my code above, I'm just being lazy, to be honest, and implementing the feature with the fewest changes possible! ;)
I do think your case-sensitivity option is important too. I had considered working this into my "secret" regex activation code - something like regex(YOUR_REGEX)params(CASE_INSENSITIVE)
- but had ignored the idea as this just makes it very confusing. With the UI changes, it'd be a great idea. :)
Let's see what @eyedol etc. think?
An other possibility would be, making two input fields and a checkbox for each to (de)activate (grey out the textfields) the filter mechanisms. Of course this makes the UI more complicated but in my oppinion the benefits of the additional functionality outweighs the increased complexitiy. It would also be possible to omit the checkboxes and check if the input fields are empty or not...
I would further process the message if one of the mechanisms matches (LOGICAL OR).
[x] key-word filtering
+----------------------+
| e.g. crime, fire... |
+----------------------+
[x] regex filtering
+----------------------+
| [a-z].* |
+----------------------+
Perhaps we could use a label saying you can enter regex or CSV to filter for keywords instead of the checkboxes. I agree with @olliebennett the ui can be a bit messy when we add the checkboxes
I agree with you, but we have to distinguish between keywords and regexs. The 3 ways to do that we mentioned so far are:
Of those three options, I think @swoertz's ASCII layout above is best.
I think the lack of case sensitivity toggling is a fair UX compromise, but there are two options for treating the input:
(k|K)(e|E)(w|W)(o|O)(r|R)(d|D)
if required?!Which do you prefer? My use case for RegEx, would be to case-insensitively match messages that start with a keyword, for instance ^keyword.*$
.
@eyedol, is there a fool-proof way of detecting whether the list is a RegEx, or a comma-separated list of keywords?
is there a fool-proof way of detecting whether the list is a RegEx, or a comma-separated list of keywords?
In general there is no way, I can think of. Because every keyword is a valid regex and vice-versa if I take the regex literally I can use it as keyword. If we allow only [a-zA-Z]* for example for keywords we could detect the special characters like "^$*" and so on, but I think this is confusing.
@olliebennett no, there is no fool-proof way of detecting whether the list is a RegEx or CSV. I think we should just make the keyword input filed process RegEx. So as a user when I enter, heat,sun,cold,yam
as keywords, SMSSync should be able to match them and when I enter ^keyword.*$
it should also be able to match that as well.
Okay. I guess we could try use a RegEx of our own to detect whether the filter text was a valid CSV (and if not, treat as RegEx).
This (realistically) means enforcing some limits on the acceptable keywords used in the CSV list. I propose limiting these keywords to be alpha-numeric, with spaces permitted both surrounding the comma (test, keyword
) and also within the keyword itself (test keyword,and another
).
Are we happy to place these limits on the acceptable keywords? And what should the limits be? I think those I've listed are sufficient and reasonable.
See http://refiddle.com/by/olliebennett/csv-or-regex for a first attempt that seems to work†
It would be nice to simply treat everything as a regex, but to use @eyedol's example of heat,sun,cold,yam
, we'd need to convert this to (heat|sun|cold|yam)
(or something like that) in order to make it have the same effect as the keywords would suggest.
If this is okay with you guys, the change should be trivial, and I'll update the code (and relevant documentation) and submit a pull request.
† assuming RegEx in Java behaves in the same way as RegEx in JavaScript...
This is a nice solution, but would not work for e.g. hashtags as keywords (example: #foo, #bar
). Of course if one wants to filter by these tags (#foo)|(#bar)
would work. There are other examples beside hash-tags as well. How would you document this, so that the user can recoginze the difference between keywords and reg. expressions while using the app? Many applications which are able to work with regex and "normal" strings use a checkbox (e.g. search-function in text-editors).
This is a tricky one!
I agree - there's a problem of letting the user know that what they think are normal keywords are in fact going to be treated as a (probably useless or even invalid) RegEx.
Perhaps we could add a confirmation box that pops up when saving the Sync URL settings, which would only appear if a special character has been entered, saying†:
You have entered special characters in the filter field. This will be will now be treated as a Regular Expression. | OK |
The documentation would need to state that:
Keywords can only be made of alpha-numeric characters (and spaces). For any more advanced tags, you will need to use Regular Expressions.
There would also need to be a couple of common RegEx examples, such as "starts with X", or similar.
It's my opinion that in this case, creating extra options in the UI adds more confusion than it does value.
Another option - as an alternative to using regex(abc)
- is to require the use of ^
and $
at the start and end, respectively (although that still has problems).
There's always the (inefficient) workaround of handling the RegEx matching on the server, and only taking any action if a match is found. If it doesn't match, simply return success: "true"
and then both systems can move on. This of course removes the need for RegEx support altogether (but it's not a nice solution!).
† at this stage, we could also check the validity of the RegEx.
There are so many combinations which are not alphanumeric, which would lead to the message "You have entered special characters...".
The question is what is more confusing, the checkbox or the info-message when one enters "special" characters. If one is not familiar with regex's the default value of the checkbox (FALSE) is just fine. Further I think most of the smssync users are developers anyway (or is that a wrong assumption?).
How should a user who can't handle regex's react on the info message?
@olliebennett what's the status of this? If you send me a pull request, I'm happy to merge after a code review
I'm actually using my own modified version with RegEx in place of keywords as standard. Adding the actual RegEx capability was trivial, however adding a configuration option proved more complicated (creating database info, UI elements, loading/saving of existing configuration etc.) so the version I use has no changes to the UI. :( Sorry
Seem to have lost the code, but can provide the apk if anyone needs - just email.
I'll have another go at implementing it sometime, perhaps.
Maybe provide a branch if it's not too much hassle?
I've slightly changed the way this works for me, by first checking whether the filter text matches a keyword, and if not, checking if it matches when treated as a RegEx. I know there could be issues with this approach, but can't think of any that would interfere with the typical, simple use cases.
I'm not comfortable enough with this code to throw a pull request yet, but I'll do some more tests, and consider bulking it out a bit, if needed.
Check out my feature-regex branch and see what you think. If anyone's got any input, please reply here.
We could document on SMSSync website as, for simple filtering, use CSV as the keywords and for complex filtering, consider entering a valid RegEx in the filter by input field.
Agreed. The only possible problem that I foresee is if someone sends a message containing the RegEx itself. This would match as a 'keyword'.
Do you think we should change the 'enter keywords eg fire, help' placeholder text to something mentioning regex too? Or just document it online?
Yeah, we could make the placeholder text mention RegEx too
it would be nice if you could filter sms messages by regular expressions