jaraco / irc

Full-featured Python IRC library for Python.
MIT License
392 stars 86 forks source link

Client should only process certain CTCP actions when they're the only part of a message #78

Closed Xenthys closed 8 years ago

Xenthys commented 8 years ago

Hi!

I saw the CTCP handling has a little problem: When you send, as an example, a VERSION to the bot, it works as expected. The problem is you can send any sentence containing '\x01' anywhere, it'll still be considered as a CTCP request.

Example:

Hello, I just send \x01VERSION\x01 here! - VERSION Bot: [version here] If you can look into it, thanks!
jaraco commented 8 years ago

What do you expect instead?

Xenthys commented 8 years ago

I expected it to answer only when there is only the CTCP in the message, as the bot is considering anything is a CTCP request when it contains \x01 even if it's a simple message.

Basically, I'd like it to only answer when the message first and the last char are \x01 and not when it's in the middle of a sentence.

jaraco commented 8 years ago

The code says it dequotes the message according to CTCP specifications. What leads you to believe the message you see should not be dequoted?

Xenthys commented 8 years ago

That's because all clients are answering to CTCPs only if the \x01 char is at the beginning and at the end of the sentence. Another reason for me is some IRCds like Charybdis have the channel mode +C to block CTCPs to the channel, and messages with an "inner CTCP request" aren't considered as CTCPs and thus not blocked. I may be wrong, I don't really know the specifications, but it's weird that all clients aren't following them in this case (HexChat, mIRC, irssi, WeeChat…) Next to that, I've been told to "sanitize my CTCP parsing", so I came here as, well, it's done by your lib.

Sorry if it's not an issue, I still find it weird but if that's what should be followed according to the specs, it's ok. Keep up the good work, your lib is really useful!

jaraco commented 8 years ago

I don't mean to sound confrontational. I simply don't have enough background to make a judgment on this.

If the specs say an inner CTCP is discouraged or disallowed, then it's an easy change to make. If, on the other hand, the specs say one thing but many clients do a different thing, then that's a harder change to make (given the status quo), but still a possibility.

What I want to be sure not to do is alter the behavior of the library to suit one individual's whim or preference at the detriment to others.

When I hear things like "sanitize your inputs", that signals a potential security issue. If there's a security issue with the way the lib is parsing CTCP messages, then I'm very keen to understand the crux of the issue.

Still, I imagine that a client sending a hex 0x01 character (start of heading) is very rare except with respect to the CTCP protocol.

What's the risk in responding to a CTCP message when it's not the only part of the payload?

Xenthys commented 8 years ago

That's not a risk at all, I'm just reporting what people said to me, I think they didn't use the appropriate words for that.

About the handling itself it's the first time I see a message being parsed as a CTCP request because it contains SOH chars in the middle by a maintained client/library, however it's my own judgment and not related to the specs.

I completely understand your point of view, and if this is following the specifications then there's no problem at all! :ok_hand:

You can find me on Freenode if you'd like to have a direct conversation about that, I don't know if it still belongs to a GitHub issue now.

ameliaikeda commented 8 years ago

For the record, you should only assume a privmsg is a CTCP message if and only if \x01 is the first and last character of a PRIVMSG.

This is because there are actually multiple different types of these CTCPs, that fall under the banner of "user requests":

PRIVMSG target :\x01ACTION does something\x01 (/me does something) PRIVMSG target :\x01SED <ciphered message>\x01 PRIVMSG target :\x01VERSION\x01 PRIVMSG target :\x01PING\x01

And so on.

For this reason, you should not blindly respond to any possible CTCP purely based on the presence of delimiters.

In addition, you are strongly discouraged from responding to any CTCP message sent to a channel automatically on practically every server in existence.

There is no "spec", as such, because CTCP/DCC isn't actually in the IRC spec, but if the delimiter \x01 is at both the start and end of a PRIVMSG, you should attempt to parse it as a CTCP message.

From The only really "available" spec, which isn't the original, this is how you embed multiple CTCP messages inside one.

M = ctcpQuote(H1) || '\001' || ctcpQuote(X) || '\001' || ctcpQuote(H2)

aka:

PRIVMSG target :\x01MYFUNC \x01\x01\x01SECONDFUNC\x01\x01\x01THIRDFUNC\x01

The original spec, from the wayback machine (Section 4) is actually a lot more explicit about how user requests work, and shows exactly why you cannot embed a CTCP VERSION or other user request inside of a normal message:

The basic format of any request will be as follows:

PRIVMSG <space> <target> <space> : <marker> <command> [<arg> [...]] <marker>

Within the framework of these requests, some may generate a response. This response will take the following general form:

NOTICE <space> <target> <space> : <marker> <command> [<arg> [...]] <marker>

Requests which are not recognised or are invalid may return an error message similar to the following:

NOTICE <space> <target> <space> : <marker> ERRMSG \ <command> [<arg> [...]] <marker>

The real reason for allowing embedded CTCP inside messages is for text formatting in an early version of the spec (Section 3 of the above spec).

I can send the following to a channel and clients that understand it would parse this:

PRIVMSG target :Hello, this is \x01Uunderlined!\x01, \x01Iitalic\x01 and \x01Bbold!\x01

as:

Hello, this is underlined!, italic and bold! (note: markdown doesn't have underlining)

This, however, was replaced with formatting codes (\x03 and so on)

jaraco commented 8 years ago

From The only really "available" spec, which isn't the original, this is how you embed multiple CTCP messages inside one.

Following that link, it appears to be a pretty decent spec, and consistent with the implementation as found in this library. Given that it is the actively published version, I would give it deference over other specs that have been deleted and are only available in the way-back machine.

As a result, I've added a link to that spec to the ctcp module, for clarity.

Looking at that spec, and in particular at Example 3 in Quoting Examples, there is specifically an example of a client signaling a USERINFO request alongside text of a message, which seems contrary to the assertion that

Because there are actually multiple different types of these CTCPs, ...you should not blindly respond to any possible CTCP purely based on the presence of delimiters.

and

The real reason for allowing embedded CTCP inside messages is for text formatting.

Therefore, I remain unconvinced that the existing implementation is faulty.

you are strongly discouraged from responding to any CTCP message sent to a channel automatically on practically every server in existence.

I'm not sure what you're recommending here instead. I don't believe the IRC client library has specific behaviors for responding to any message, and it's the responsibility of the individual implementations to respond to the relevant CTCP events as appropriate, and only for servers to which that client is connected. The assertion is unfounded. If you believe there's something that should be done to better handle CTCP messages, please suggest what the problem is and how you propose correcting the problem.