stjohann / DiscordWikiBot

Discord bot for Wikimedia projects and MediaWiki wiki sites
https://w.wiki/4nm
MIT License
40 stars 10 forks source link

EventStreams: Allow title matching using regular expressions #13

Open stjohann opened 2 years ago

stjohann commented 2 years ago

Some server owners have long requested adding ways to stream a number of defined pages using the bot.

I have thought before that the best way for doing this would be something like glob patterns, but this has multiple problems. For one, you would have to re-implement or take a library that is doing glob matching. There are also questions on whether it would be clashing with actual MediaWiki titles. After researching this question for a bit, I decided that just allowing people to use regular expressions (regexps) is good enough to solve this need.

Here are the theoretical requirements for any potential implementation:

  1. Regexps can be passed only to --title attribute of the configuration.
  2. Regexps should be passed using --title /.*/ syntax (i. e. always wrapped into //), since this would keep the params to the minimum and introduce a simple way to tell what is a regexp and what is not (str.StartsWith('/')). This needs to account for articles like https://en.wikipedia.org/wiki//b/ which are unlikely to have their own stream feeds but probably still need some way to reference them in EventStreams (e. g. :/b/?).
  3. The code should define a reasonable MatchTimeout (0.5 second?) and try/catch errors from slow regexps to prevent any ReDOS attacks.
  4. Passed regexps should be tested with the timeout and slow regexps should be rejected by the bot on the configuration step (!openStream).
  5. Passed regexps should match the whole string for clarity (^…$) and should not ignore case.
  6. (If we can find a way) Regexps should be as simple as possible in the number of features allowed.

There might be other notable things I forgot, please report them if you read the issue and can think of them.

stjohann commented 1 year ago

Another idea: make --title-matches key (name can be discussed) (--in-title?) for --namespace streams only for simplicity (makes it easier to process this and would require less changes to the current shaky structure of the code).