psychs / limechat

IRC Client for Mac
http://limechat.net/mac/
1.55k stars 266 forks source link

Automatic linking of urls broken for urls containing ä #155

Closed Krinkle closed 7 years ago

Krinkle commented 11 years ago
<#wikimedia-stewards> Trijnstel: [22:21]    next candidate http://de.wikinews.org/wiki/Spezial:Beiträge/DelMccoy <- doesn't exist?

<#wikimedia-stewards> Trijnstel: [22:21] next candidate http://de.wikinews.org/wiki/Spezial:Beiträge/DelMccoy <- doesn't exist?

screen shot 2013-06-03 at 10 35 06 pm

awdsmirk commented 11 years ago

Hi. I reproduced your issue by typing in the URL you provided rather than simply copying & pasting it. I am new to Limechat, Xcode, Git/GitHub, but I managed to locate the source of the problem in NSStringHelper.m; it seems for some unknown reason (to me at least) the application does not allow non-ASCII characters in URLs unless the domain is wikipedia.org.

I searched the Limechat Google Group to see if there were any reasons defined/discussed for this behavior but could not find any. If anyone familiar with the code (the author I suppose would be a good start) has any insight, it would be much appreciated. The relevant code snippet that causes the issue is here:

    // exclude non ASCII characters from URLs except for wikipedia
    NSString* host = [[self substringWithRange:[result rangeAtIndex:2]] lowercaseString];
    if (![host hasSuffix:@"wikipedia.org"]) {
        NSRange pathRange = [result rangeAtIndex:3];
        if (pathRange.length) {
            static NSRegularExpression* pathRegex = nil;
            if (!pathRegex) {
                NSString* pathPattern = @"^/[a-zA-Z0-9\\-._~!#$%&'()*+,-./:;=?@\\[\\]]*";
                pathRegex = [[NSRegularExpression alloc] initWithPattern:pathPattern options:0 error:NULL];
            }

            NSString* path = [self substringWithRange:pathRange];
            NSRange newPathRange = [pathRegex rangeOfFirstMatchInString:path options:0 range:NSMakeRange(0, path.length)];
            if (newPathRange.location != NSNotFound) {
                int delta = pathRange.length - newPathRange.length;
                if (delta > 0) {
                    r.length -= delta;
                }
            }
        }
    }
Krinkle commented 11 years ago

The exception for Wikipedia indeed doesn't make sense. However if anything, at the very least the exception should be widened to all MediaWiki-powered websites (the software is open-source and used by many third parties, very little (if anything) in the url handling is specific to Wikipedia).

As a start, expand it to all Wikimedia Foundation hosted MediaWiki-powered websites (below domains from www.wikimedia.org):

# These run MediaWiki from both HTTP and HTTPS at /wiki and /w:

*.wikimedia.org
*.wikipedia.org
*.wiktionary.org
*.wikiquote.org
*.wikibooks.org
*.wikisource.org
*.wikinews.org
*.wikiversity.org
*.wikidata.org
*.wikivoyage.org
wikimediafoundation.org
www.wikidata.org
www.mediawiki.org

However, from what I can see, it might make more sense to remove the exception entirely and make it the default behaviour for all urls.

awdsmirk commented 11 years ago

@Krinkle - Agree that the exception I listed from the source doesn't seem to make sense. My research indicates that possibly the behavior is a result of concerns regarding:

1) Adherence to RFC 3986 which strictly speaking does not allow for non-ASCII characters in URIs. 2) IDN homograph attacks. According to the Wikipedia article, the responsibility for preventing or mitigating these attacks fall on the browser, which implies that Limechat may not need to be responsible for stripping non-ASCII characters in the first place.

I'll create a pull request to delete the code in question. This should hopefully prompt additional discussion about the issue, what the proper designed behavior should be and eventually an updated version of Limechat with the fix.

awdsmirk commented 11 years ago

Learning more about using Git. Looks like the code in question that I identified previously is tied to this commit: 4851eda8aa78. This is back from May 2010...so this is old functionality. Surprised no one has complained about this sooner. Anyway, I suppose the proper thing to do would be to ask that the specific commit be rolled back rather than submit a new pull request to change it. Not sure, so I'll have to check. Hopefully the people on ##limechat can point me to the proper procedure.

Krinkle commented 11 years ago

For the record, I didn't type this url in LimeChat, I copied it from the address bar in Chrome. Since a few versions ago, Chrome (sometimes?) copies a "pretty" version of the url to the clipboard. That is, prettier than some other browsers do, but good enough that supposedly "all" browsers can still handle it (that is, they take it as input in the address bar but possibly encode it in the actual request).