Closed mattermod closed 5 years ago
I'll take this one.
A rough design sketch:
I think a regex is the best way to easily capture all the links without creating a hand-written link parser. This is what I have now https?://(?P<haswww>www.)?github.com/(?P<user>\w+)/(?P<repo>\w+)/blob/(?P<commit>\w+)/(?P<path>[\w/.]+)#(?P<line>[\w-]+)?
and it does the job. The size is mainly due to the named capture groups; otherwise it's pretty simple.
Notes: the link must be a permalink to a file having a line number or a line number range. Anything else will not work. Also, more importantly, permalinks inside hyperlinks should not be extracted (Need to see how best to do this).
Github does not provide any API to just get the line(s) from a permalink. From the browser it makes a call to https://github.com/preview
with some query params and it gets an HTML response. So I think internally, github server does something. So we need to rely on the contents API to give us the file contents and filter out the lines by ourselves.
Notes: The API only supports files of 1MB in size, which I think is a good thing. We don't want to fetch huge files just to show a code preview. Secondly, what if somebody posts 500 permalinks in a single message ? Some thought needs to be given on limiting the no. of requests.
For now, I have a simple version where I create a link to the given file with a markdown code block showing the desired lines. We can iterate further on the design. I also check the file extension and add the appropriate language tag in the markdown.
I have a rough prototype ready:
Before:
After:
@marianunez / @hanzei - Does this approach sound reasonable to you ?
Side note: I see that for the /slash commands, the github client is constructed anew every time from the userID in the header. I think perhaps, it is better to wrap all usages of that in a sync.Once
, so that the client is created only once and reused for future API calls.
This looks great @agnivade! Only a couple of observations:
The contents API seems to return a link to the source of the file where you would have to parse it in order to find the needed lines.
Prima facie, that's what it looked like to me too. But it also returns a base64 encoded version of the file contents itself. See the "content" field. So it's just one Github call.
One thing that I noticed is that within the permalink it does contain within the DOM the source code lines that Github uses to display.
Didn't get this. Did you mean to use the github internal API (https://github.com/preview
) to get the html ? That uses some undocumented query parameters like markdown_unsupported=false&repository=140969738&subject=8&subject_type=Issue
and is subject to change any time from Github.
Didn't get this. Did you mean to use the github internal API (
https://github.com/preview
) to get the html ?
I meant an actual GET Http Request to get the permalink HTML contents but you are right, it's one request either way.
Right, and even then the underlying HTML structure can change anytime. I think it is safer and cleaner to use the API.
Right, and even then the underlying HTML structure can change anytime. I think it is safer and cleaner to use the API.
Agreed!
Also, we should add a #4 to your list for having a setting to enable/disable this feature on the plugin.
Also, we should add a #4 to your list for having a setting to enable/disable this feature on the plugin.
Yes absolutely.
Nearly done with it. Tested all scenarios. Seems to work. Remaining work is to add the config switch, write test cases and add more polish.
Btw: here's a nice visualization of the regex I am using - https://jex.im/regulex/#!cmd=export&flags=&re=https%3F%3A%2F%2F(www%5C.)%3Fgithub%5C.com%2F(%5Cw%2B)%2F(%5Cw%2B)%2Fblob%2F(%5Cw%2B)%2F(%5B%5Cw%2F.%5D%2B)%23(%5B%5Cw-%5D%2B)%3F.
One last note: The current logic will expand links inside backtick blocks too. To get around this, we would need a full-blown markdown parser to parse every single message. That seems like a lot of work for messages which don't have any permalinks at all. And after that, we need a regex pass again to extract the links.
A better way might be to somehow achieve this on the client side, because the client is already doing the markdown parsing. But that would need passing the github token to the client (which might be a security issue). Github can do that because they own the pipeline end to end.
For now, I think we can go ahead with the current flow which is fast and simple; and circle back on further improvements depending on what the feedback is. At worst case, it is anyways killable with a switch.
One last note: The current logic will expand links inside backtick blocks too. To get around this, we would need a full-blown markdown parser to parse every single message. That seems like a lot of work for messages which don't have any permalinks at all. And after that, we need a regex pass again to extract the links.
A better way might be to somehow achieve this on the client side, because the client is already doing the markdown parsing. But that would need passing the github token to the client (which might be a security issue). Github can do that because they own the pipeline end to end.
For now, I think we can go ahead with the current flow which is fast and simple; and circle back on further improvements depending on what the feedback is. At worst case, it is anyways killable with a switch.
I agree. This border case can be handled as a separate PR as a fix/enhancement in the future.
To get around this, we would need a full-blown markdown parser to parse every single message. That seems like a lot of work for messages which don't have any permalinks at all.
Is this really a problem? The autolink plugin already does this for example to avoid the - IMHO broken - problem of replacing stuff inside code blocks.
There is certainly an overhead here to pass every single message through a markdown parser. Whether that overhead is acceptable or not is a different question, given that we don't have any SLAs around plugin response latency.
When a user post contains a Github code line permalink, it should attach a markdown code block that will contain the preview of that section of the code with possibly 3 lines before and after (much like the display shown in code compares in Github:
The Github plugin should subscribe to the
MessageWillBePosted
hook and also include a setting to enable/disable this feature.If you're interested please comment here and come join our "Contributors" community channel on our daily build server, where you can discuss questions with community members and the Mattermost core team. For technical advice or questions, please join our "Developers" community channel.
New contributors please see our Developer's Guide.
JIRA: https://mattermost.atlassian.net/browse/MM-18880