Further doc improvements

Numerlor commented 4 years ago

Sort of a continuation to #546 to get feedback before getting to it.

Before (trying to) implementing #447, duplicate items will be needed to be removed from the inventories dict to get only the relevant results. These were mentioned in comments of the aforementioned issues.

That will drastically decrease the amount of items and their used space so I thought we can also up the cache max_size in here https://github.com/python-discord/bot/blob/3d25c790a69421e0ed9c7c7a29ca1d5833322169/bot/cogs/doc.py#L283-L286 as we got some additional space and can save requests on the few additional items that may be there; although the chance of hitting the cache decreases with the size.

Another thing to look into is the markdown converter that's used on the contents Discord handles embeds a bit differently on desktop and mobile (android)

Where the internal references don't show up as inline codeblock

And then the last one I goton my mind; we collect all the signatures of functions but some have aliases and others group them by category. How should this be handled if at all? As examples here for the first one, everything under the fetched signature is included but also relevant to how the contents are written, but in the second one we just get an alias that's mostly unneeded.

MarkKoz commented 4 years ago

Where the internal references don't show up as inline codeblock

Do inline codeblocks within embeds not work on mobile at all or what's the deal? I agree this should be fixed if possible.

we collect all the signatures of functions but some have aliases and others group them by category. How should this be handled if at all?

There isn't a reliable way to distinguish aliases, is there? I don't think having aliases shown is that bad of a thing anyway.

Numerlor commented 4 years ago

Do inline codeblocks within embeds not work on mobile at all or what's the deal? I agree this should be fixed if possible.

inline codeblocks alone are shown and work properly but the internal references are translated to (invalid) links through the converter and that is handled differently on the mobile client. The secrets embed has this description

description: New in version 3.6.

             **Source code:**
             [Lib/secrets.py](https://github.com/python/cpython/tree/3.8/Lib/secrets.py)

             The
             [`secrets`](#module-secrets "secrets: Generate secure random numbers for managing secrets.") module
             is used for generating cryptographically strong random numbers suitable for managing data such as
             passwords, account authentication, security tokens, and related secrets.

             In particularly,
             [`secrets`](#module-secrets "secrets: Generate secure random numbers for managing secrets.") should
             be used in preference to the default pseudo-random number generator in the
             [`random`](random.html#module-random "random: Generate pseudo-random numbers with various common
             distributions.") module, which is designed for modelling and simulation, not security or
             cryptography.

There isn't a reliable way to distinguish aliases, is there? I don't think having aliases shown is that bad of a thing anyway.

Guess there isn't really; but on a related note, in case of the screenshot with the dunders, if __le__ was fetched do we want to grab all of the signatures or only the ones following __le__?

MarkKoz commented 4 years ago

My idea for fixing the links is to parse them as markdown to find links and then check if they start with a valid schema (just http or https in this case). It could be taken a step further by trying to resolve the relative links and making them absolute. I don't know feasible this parsing is in practice.

For the dunders, in that specific case, I think they should all be included since they are all relevant to the documentation text, as you stated yourself.

Numerlor commented 4 years ago

Resolving the links shouldn't be much of an issue in itself, just need the url of the page it's from. But don't know how viable its is to plug that into the markdown converter instead of doing that on the already formatted string, which would be more welcome

MarkKoz commented 4 years ago

It doesn't necessarily have to be parsing Markdown. That was just my naïve suggestion, being unfamiliar with the internals of the cog.

Numerlor commented 4 years ago

The cog uses a subclassed MarkdownConverter from markdownify to parse the html into markdown. Should be able to do it there but haven't look at the source yet to know how easy it'll be with the modularity of the super class

Numerlor commented 4 years ago

Alright to start working on the discussed features?

MarkKoz commented 4 years ago

Yes, I think I've answered everything.

Numerlor commented 4 years ago

I'll begin then

Numerlor commented 4 years ago

After resolving the urls the end up looking like this on mobile

On desktop it's a normal codeblock with a link. Is this behaviour fine? Stripping of the backticks would have to be done after they were added in because of how docs are structured if we went with links without codeblock, or the urls can just return text with no linking involved

MarkKoz commented 4 years ago

One one hand, the back ticks don't look too bad on mobile. On the other hand, if they're already links, they're distinguished enough to not need to be in code blocks too. I could go either way but leaning towards keeping the code block.

Numerlor commented 4 years ago

Thought about the aliases/grouped signatures, we only fetch 3 to keep it short-er so which ones do you think should be included if more than 3 are present? Going with the previous object.__lt__ example, we can't get them all because 6 separate codeblocks would be huge. Collecting all the signatures under the fetched ones, and then searching up if there's still space left, or picking the fetched one and then going up from the bottom came to mind for how to do it

MarkKoz commented 4 years ago

For each direction, there are probably cases where that direction is preferable, but that cannot be determined by the code. Therefore, I don't think it matters which ones you choose. Might be best to stick to displaying whatever is adjacent to the queried signature.

Numerlor commented 4 years ago

Stumbled upon a few more things.

Do we need the default listing of inventories behind get? getlist This is implemented as a check in the get subcommand when the symbol is empty, but to me the nicer behaviour would be that output only for me !doc and nothing for an empty get.

The set shadows the set symbol from the inventories, any opinions on handling it and other subcommands as symbols when the role check doesn't pass? Currently there's just no output from the bot.

In the previous doc PR I've implemented auto deletion of the not found embed, in retrospect this would probably be better off with the trash can emoji reaction handling, any thoughts on this? I believe it'd be best to only include it on the failed gets as the chance of getting the wrong doc page is minimal.

And the last, are there any possible complications from an escaped codeblock in an embed? Currently it's converted with clean_content and then simply wrapped in backticks when not found which is simple to get out of. https://github.com/python-discord/bot/blob/e37041622aebc8fef75d83fcd8970dc527c056bc/bot/cogs/doc.py#L384-L388

Numerlor commented 4 years ago

Does the url when adding an inventory through doc set need to be verified like here? https://github.com/python-discord/bot/blob/e37041622aebc8fef75d83fcd8970dc527c056bc/bot/cogs/doc.py#L126-L132 a mistake should be noticeable immediately after the command and while it's a fairly uncommon command to use it'll also reduce the needless requests.

I think this leads to worse ux or code; currently when the set_command is invoked it'll do 2 web requests out of 3 total before it sends out a typing event causing a noticeable delay during which the user could be thinking that something went wrong. Moving it to the top of the command method still leaves out the verifying request since it's a part of the converter, and I don't really like triggering the typing event there

MarkKoz commented 4 years ago

Do we need the default listing of inventories behind get? ... the nicer behaviour would be that output only for me !doc and nothing for an empty get.

I kind of like it. I don't think it should be empty; it should at least show the help command, but showing the available inventories feels more useful since the get command is not complex enough for people to really need to see the help.

The set shadows the set symbol from the inventories, any opinions on handling it and other subcommands as symbols when the role check doesn't pass? Currently there's just no output from the bot.

Putting it in quotes works, so I'm not concerned about it: !d "set"

In the previous doc PR I've implemented auto deletion of the not found embed, in retrospect this would probably be better off with the trash can emoji reaction handling, any thoughts on this?

Yeah, I agree. It would still autodelete it but allow for a pre-emptive delete via the reaction, right?

And the last, are there any possible complications from an escaped codeblock in an embed?

Escaping any markdown should be OK to do; I don't see any problems with doing that just to be safe.

a mistake should be noticeable immediately after the command

How? There'd be nothing preventing the post to our API, meaning the user would have to manually delete the incorrect inventory afterwards. Adding inventories is not a frequent occurrence so I don't mind extra wait time from requests. How much time does it really add anyway?

I don't really like triggering the typing event there

I don't see anything wrong with triggering it in the converter.

Numerlor commented 4 years ago

The set shadows the set symbol from the inventories, any opinions on handling it and other subcommands as symbols when the role check doesn't pass? Currently there's just no output from the bot.

Putting it in quotes works, so I'm not concerned about it: !d "set"

There haven't been that many, but searching through the history there were a few instances where people seemingly though it just didn't work after a !doc set, !doc get set is usually the next step but I doubt someone would try the quotes. get set works perfectly fine so this would be just a small QoL improvement

In the previous doc PR I've implemented auto deletion of the not found embed, in retrospect this would probably be better off with the trash can emoji reaction handling, any thoughts on this?

Yeah, I agree. It would still autodelete it but allow for a pre-emptive delete via the reaction, right?

I was thinking about just adding the reaction and leaving it up to the user, but doing it automatically along with the reaction solves the problem I had with it where the messages just disappeared without any hint

I've also thought about a deeper caching system, for whole pages instead of only caching symbol embed because of the structure of some docs where all the symbols are in one page. Do we care more about the response time and doing additional requests or the RAM space (or drive if it's stored there)? I believe only discordpy has the one page layout out of the commonly fetched packages so it'd probably top out somewhere around 3-5MB most of the time, with the other docs being separated by modules etc..

MarkKoz commented 4 years ago

I doubt someone would try the quotes

Fair enough

get set works perfectly fine so this would be just a small QoL improvement

Isn't get shadowed too? Would you propose renaming both commands? And to what? I don't really mind them being renamed since set is seldom used and non-public and get is usually used as !d g or just !d.

Do we care more about the response time and doing additional requests or the RAM space (or drive if it's stored there)?

I think response time is important. I am not keen on the current response times for docs. I think we have enough RAM to handle it, but @j03b is more familiar with that end of things. I think presenting actual RAM usage figures would help a lot in determining if it will be an issue.

Numerlor commented 4 years ago

Isn't get shadowed too? Would you propose renaming both commands? And to what? I don't really mind them being renamed since set is seldom used and non-public and get is usually used as !d g or just !d.

get would be shadowed but the symbol doesn't exist in the current inventories. There shouldn't be many problems if it assumes the command was intended to be executed when the permissions are there; but renaming doesn't sound like a bad idea, although I don't know what to.

MarkKoz commented 4 years ago

getdoc and setdoc? I don't think it really matters to pick a nice name since most of the time the aliases will be used. Just needs to be sensible and unlikely to shadow anything.

Numerlor commented 4 years ago

We could go with invalid identifiers like get-doc and set-doc to be completely safe but getdoc and setdoc sound good and should be future proof since there's almost no normal non python symbols that don't have the lib prefix

Numerlor commented 4 years ago

Do you think the doc suffix should also be added to the delete/remove and the refresh subcommands?

MarkKoz commented 4 years ago

Yeah, wouldn't hurt. The commands are staff-only and seldom used. They'll still have convenient aliases anyway. Other options are a separate command group (possibly unintuitive) or a nested group (cumbersome?)

jb3 commented 4 years ago

We have more than enough RAM for this (and we have alerts set up in case we don't (but we do)).

We average at around 4-5GB of RAM usage but the server has a total of 8GB (and 1GB swap as well).

Numerlor commented 4 years ago

How do we feel about removing the link from read more in truncated doc responses? The link is already the embed's url, although it does not highlight on mobile so I've thought about removing the inline codeblocks from the title, which I think also ends up looking a bit more pleasing. embed permalink

I'm rewriting the module and abstracting some things, so this would prove useful as one less thing to need to pass around.

MarkKoz commented 4 years ago

So it highlights the title link on mobile if the code block is removed? If that's the case, I'm totally on board with removing the title code block and relying solely on the title link.

Numerlor commented 4 years ago

While there was not much use for the link itself, I think the highlight showed a bit better that it wasn't a part of the text. I think just the text read more blends in a bit too much, but haven't found an alternative I was entirely satisfied with doc readmore Do you think it's fine as it - but without the link, or some of the alternatives I fit into the image?

MarkKoz commented 4 years ago

I think a simple ellipses would be fine.

Numerlor commented 4 years ago

I feel like it blends in too much in the first embed here, Any other opinions on this?

MarkKoz commented 4 years ago

Might look nicer if you strip punctuation before adding the ellipses.

python-discord / bot

Further doc improvements #878