mattermost-community / mattermost-plugin-autolink

Automatically rewrite text matching a regular expression into a markdown link.
Apache License 2.0
131 stars 57 forks source link

Autolink should ignore bot posts. (was: Commit notifications trigger unexpected link preview) #94

Closed lieut-data closed 4 years ago

lieut-data commented 4 years ago

To avoid clashes with bot-formatted links, Autolink should (optionally?) ignore bot posts.

1/5 recommend adding an "Process posts from Bots", default false to the plugin config.


Original description:

I recently received a notification in the message of a commit:

image

In my summary, this then resulted in a perhaps unexpected link preview compared to all the other links in the message:

image

levb commented 4 years ago

(at a superficial glance) - review and add tests for https://github.com/mattermost/mattermost-plugin-github/blob/23f77f082659e78f1d519eb3aacb0ceb94a3cc21/server/plugin.go#L57?

vespian commented 4 years ago

I'll like to take this one.

lieut-data commented 4 years ago

Thanks, @vespian!

vespian commented 4 years ago

Short update - the problem occurs due to other users doing @mentions of the given user in the commit message. We would need to adjust the regex as pointed out in [2] and adjust the current code [1] to handle rewriting the raw link to the commit (i.e. do not always assume a file/blob) to a nice markdown link.

Will try to prepare a patch.

[1] https://github.com/mattermost/mattermost-plugin-github/blob/ab7832c593851ce7b55882a1325e6e64bf933bc1/server/permalinks.go#L113-L117 [2] https://github.com/mattermost/mattermost-plugin-github/issues/179#issuecomment-578401262

vespian commented 4 years ago

@lieut-data

I was playing with the code rewriting the raw link to the commit [4] and realized that I may not have all the data yet.

The ToDo text you included in the issue description one can trigger by sending a command directly to the bot (/github todo) but from what I can see/understand this posts an ephemeral post, which in turn does not trigger MessageWillBePosted hook (see my question in the ~developers channel [2], I already have enabled Enable Code Previews in system console, the hook is simply not called by the server).

Another option is while connecting/disconnecting to github[5] the todo list also gets displayed, but I guess this is not the case here.

The third option is the API call via ServeHTTP-/api/v1/todo, but I can't find it being called anywhere.

I wonder if I misconfigured somehow my dev instance. How can I recreate it reliably? I was using latest master versions of server and gh plugin.

Also, I wonder why the MessageWillBePosted hook is called with the userID of github bot, which does not seem to have GH credentials stored and results in error [3]. I suppose that I did something wrong while setting up my test instance which results in the userID of the bot instead of that of the user making the post being called. Do you have any ideas maybe?

Thanks in advance for any information.

[2] https://community.mattermost.com/core/pl/n8rzet9gfjgyfr9kgcafh7xnfy [3] https://github.com/mattermost/mattermost-plugin-github/blob/6e7a3ef8df706cbaf39c579e42d6452b0cbfe968/server/plugin.go#L222 [4] https://github.com/mattermost/mattermost-plugin-github/blob/ab7832c593851ce7b55882a1325e6e64bf933bc1/server/permalinks.go#L113-L117 [5] https://github.com/mattermost/mattermost-plugin-github/blob/6e7a3ef8df706cbaf39c579e42d6452b0cbfe968/server/api.go#L292

vespian commented 4 years ago

Also, the todo list output looks like this for me (ignore the PRR part): todo

lieut-data commented 4 years ago

@vespian, MessageWillBePosted will indeed only be called for non-ephemeral posts, but in this particular case, I get a regular post from the GitHub plugin's morning reminder. You can enable this with /github settings reminders on.

re: the GitHub bot and UserId, I might be misunderstanding, but it is indeed the GitHub bot making these posts, so that part seems correct. Does the code preview feature rely on the posting user's GitHub credentials to make the underlying API request? That might be a problem if users who don't link their GitHub accounts post such messages. I don't recall if this is an existing problem with the plugin, but perhaps this particular request should be made with the "default" GitHub credentials?

vespian commented 4 years ago

RE: Github credentials - I have prepared a simple patch:

https://github.com/mattermost/mattermost-plugin-github/pull/191

It does not solve the main issue though, just fixes the thing that most likely is a different bug. Please check the PR description for details.

levb commented 4 years ago

I am so very confused (and apologies for my ignorance about how the plugin works).

This is a message that is generated by the plugin itself, do we really use MessageWillBePosted to rewrite it? Seems illogical, why not compose correctly to start with?

Also, FWIW, if we are setting up regex link rewrites from the GitHub plugin, perhaps we should be promoting the autolink plugin, and programmatically set it up from GitHub, like we are doing with Jira now?

cc @lieut-data @crspeller @jfrerich

vespian commented 4 years ago

@lieut-data

Please bear with me, I still had no luck in recreating the error, even with mattermost/mattermost-plugin-github#191. Is it possible that the previews you mentioned are done through autolinks plugin?

image

Could you please post the output of mmctl plugin list?

The regexp for the Github's plugin preview [1] would not match the commit link you posted, so I am wondering if there is something more at play here.

Thanks in advance!

[1] https://github.com/vespian/mattermost-plugin-github/blob/ce0a3a6d4ed08995994702b70307270f02ac08b5/server/plugin.go#L57

lieut-data commented 4 years ago

@vespian, ahh, perhaps it is autolink that's at play here, and not GitHub at all. That makes a lot more sense: and thank you for digging to the bottom of this.

@levb, would it makes sense to guard autolink with ShouldProcessMessage?

levb commented 4 years ago

would it makes sense to guard autolink with ShouldProcessMessage?

@lieut-data Yes, I think so. Do you think we should expose the options in the Config? Do we have common code to do that?

Separately, I also think we should work on removing link substitution from GitHub and delegate it all to Autolink, programmatically, like Jira does now. This would be better for performance, and promote plugin usage. cc @aaronrothschild @jfrerich.

Now, what do we do with this issue then, close it?

The relevant Autolink config on community is image

lieut-data commented 4 years ago

@levb, I'd be happy to move this issue over to the autolink repository to keep the history. Agreed re: investigating programmatically managing link substitution, but as a separate issue?

levb commented 4 years ago

I think the issue should stay here, and be repurposed if the history is important. Autolink will already expose the link config APIs in the next release, no special work needed there AFAIU. We just need to add the code on the GiitHub side to manage the links, and probably a config/feature flag to use this new capability instead of the current in-plugin substitution. cc @jfrerich

levb commented 4 years ago

(closed by mistake, reopened)

lieut-data commented 4 years ago

I think there's two separate issues here:

One reason to keep this in the GitHub plugin is to solve the second issue, which I guess also solves the first issue? One concern is that this means the GitHub plugin, out of the box, isn't as fully-featured as GitHub + AutoLink, and I have no idea of the lifecycle of these plugin-to-plugin registrations. Seems like a thorny problem and worthy of design + investigation, as compared to solving the original report.

Thoughts on this interpretation?

levb commented 4 years ago

autolink shouldn't process messages by bots (open for discussion)

@lieut-data I'd argue against this complexity. Autolink should already ignore substituting links in markdown; Bots should be aware of the markdown nature of the contents and properly encode links in the message bodies if they come from external systems as raw text or other formats. Ditto for the messages that the bots generate.

GitHub should rely on autolink for link substitution (not as clear to me, and also open for discussion)

2 motives I can think of:

  1. Performance - consolidating all substitutions in 1 plugin means 1 MessageWillBePosted RPC/Post, vs 1 for Jira, 1 for GitHub, etc.
  2. Predictable order of substitution - something that can be managed in the Autolink plugin if not yet. An example of the problem is, on Community we have the (US) Social Security Number masking autolink "Off" because it ended up affecting the Zoom URLs. To be clear, this problem is not yet solved in a meaningful way; but it appears more solvable within the Autolink plugin than across all of them.

So I'd advocate that we a) add "auto-configure-auto-link" option to the GitHub plugin (off by default, mutually exclusive with "legacy" b) figure out when/how we switch the defaults and migrate

lieut-data commented 4 years ago

Autolink should already ignore substituting links in markdown; Bots should be aware of the markdown nature of the contents and properly encode links in the message bodies if they come from external systems as raw text or other formats. Ditto for the messages that the bots generate.

I think this adds more complexity since it requires bots/plugins to artificially encode their payload to avoid some other event they might not have been able to predict. Maybe I want a /real/ link without such a preview, because I'm doing something else, but along comes autolink and decides better for me.

In fact, the simplest model might be to disallow bot rewriting of other bot's posts at the server level: if GitHub wants autolink post-processing, maybe it can opportunistically call out to the autolink API and ask for a payload to be rewritten. Only humans get autolinked.

lieut-data commented 4 years ago

a) add "auto-configure-auto-link" option to the GitHub plugin (off by default, mutually exclusive with "legacy"

What happens when the rewrite requires metadata? I think all a lot of these rewrites involve an API call out to GitHub to look up some useful data. I'm having a hard time seeing AutoLink as a general purpose platform to replace the existing MessageWillBePosted calls.

levb commented 4 years ago

In fact, the simplest model might be... Only humans get autolinked.

It is a 180 from how I was thinking of it, but I am fully persuaded that it is a simple and consistent policy.

levb commented 4 years ago

What happens when the rewrite requires metadata?...

@lieut-data hopefully we don't invoke GitHub APIs in MessageWillBePosted - that'd seem inefficient. And if the metadata is cached, I'd argue it should be submitted to Autolink APIs when it changes.

levb commented 4 years ago

@vespian are you interested in finishing this?

vespian commented 4 years ago

@vespian are you interested in finishing this?

Yep.

vespian commented 4 years ago

Let me know if this is something we are after: https://github.com/mattermost/mattermost-plugin-autolink/pull/98

I did not make it optional to keep it simple, both when it comes to code complexity and the plugin configuration for users. It would be nice to doublecheck with the Product Manager though.

lieut-data commented 4 years ago

@lieut-data hopefully we don't invoke GitHub APIs in MessageWillBePosted - that'd seem inefficient. And if the metadata is cached, I'd argue it should be submitted to Autolink APIs when it changes.

@levb, in the GitHub use case we're talking about, it exactly does require a GitHub API call on MessageWasPosted (agree that doing this on MessageWillBePosted is impractical). I don't think there's anyway we can pre-emptively cache the autolink plugin with knowledge of the contents of an arbitrary GitHub permalink, so this functionality will always live in the GitHub plugin.