wikimedia-gadgets / twinkle

The English Wikipedia twinkle javascript helper
http://en.wikipedia.org/wiki/Wikipedia:Twinkle
Other
135 stars 149 forks source link

Replace hardcoded regexes for namespaces(e.g.: (?:[Ii]mage|[Ff]ile) ) by others generated from wgNamespaceIds #103

Closed he7d3r closed 3 years ago

he7d3r commented 12 years ago

Currently, there are some instances of the regex "(?:[Ii]mage|[Ff]ile)" and "(?:[Tt]emplate:)" in the code.

It would be better to use all possible aliases of the "File:" namespace ("Template:", respectively) which are in use in a given wiki.

E.g.: On Portuguese Wikipedia, the regex should be "(?:[Ff]icheiro|[Ii]magem|[Aa]rquivo|[Ii]mage|[Ff]ile)". The code below should work on any other wiki as well:

var ids = mw.config.get('wgNamespaceIds'),
    aliases = [],
    first;
for( nsName in mw.config.get('wgNamespaceIds') ){
    if ( ids[nsName] === 6 ) {
        first = nsName.substr(0,1);
        aliases.push( '[' + first.toUpperCase() + first.toLowerCase() + ']' + nsName.substr(1) );
    }
}
alert( '(?:' + aliases.join('|') + ')' );
atlight commented 12 years ago

Since Twinkle is English Wikipedia-specific, I don't think it would be useful to fix this, particularly since adding on-the-fly regex creation logic would unnecessarily slow down the code. Anyone localising Twinkle should fix these regexes as they go.

atlight commented 12 years ago

Hmm; I didn't see that this was in morebits. I am not sure what to do with morebits - whether to make it fully localisable (with string table, etc) or whether just to leave it for translators to modify. For the moment it will obviously be the latter!

he7d3r commented 12 years ago

I believe once the new version of the gadgets extension is available, the localisation would be moved to MediaWiki:messages and the gadget could be moved to the central repo of gadgets (e.g. mediawiki.org). In this sense, the most language-independent we keep the whole code, the better. Or else the chances are we will have the usual proliferation of outdated hacks being copied from one wiki to another...

This was my motivation to report some bugs/request some enhancements.

atlight commented 12 years ago

Yes, it would be nice. However, that would only work for core parts of Twinkle, since most modules depend on some kind of local structure (e.g. CSD criteria/tagging templates/notification templates, Welcome templates; ARV page format; Tag templates). Siddhartha Ghai is working on TWG, which is a project similar to Twinkle that is designed with localisation in mind: see [User:Siddhartha Ghai/TWG.js](http://en.wikipedia.org/wiki/User:Siddhartha Ghai/TWG.js).

It is my long-term goal to make Twinkle as localisable as possible (still, modules like XFD will surely require code modification across wikis, but modules such as CSD, Tag, Welcome should be able to work on different wikis). However, it would take a lot of work.

Where is the information about the new version of Gadgets? It's not obvious to me.

he7d3r commented 12 years ago

There is some info here: https://www.mediawiki.org/wiki/ResourceLoader/V2_testing#RL2_in_a_nutshell https://www.mediawiki.org/wiki/ResourceLoader/Version_2_Design_Specification#Messages but I've seen comments about Gadgets 3.0 as well, https://www.mediawiki.org/wiki/Roadmap#MediaWiki_infrastructure and I'm not sure what exactly will come in that version... The improvements from last GSoC would be great for customizations: https://www.mediawiki.org/wiki/User:Salvatore_Ingala/Notes

Amorymeltzer commented 5 years ago

This is necroposting, but, uh, basically none of the above has yet come to pass.

Regardless, a good first step would be removing the hardcoding of Morebits.wikipedia. MediaWiki-1.16 (circa 2011) added wgNamespaceIds and wgFormattedNamespaces, which should take care of most of the uses of those objects. Morebits.wikipedia.namespaces is basically a carbon copy of wgFormattedNamespaces (with the added advantage that the project name for project is used).

There are only three uses of Morebits.wikipedia.namespaces and Morebits.wikipedia.namespacesFriendly in our codebase. It shouldn't be too difficult to remove them, but I don't know who or what else relies on the objects: the only uses I can find are old, unused copies of old Twinkle code, nothing relying on current gadgets. Morebits.wikipedia.namespacesFriendly seems particularly unlikely to be widely used outside en.wiki; it removes -1 and -2 and uses Wikipedia for 4 and (Article) for 0, which we can deal with (using (Main) should be fine across projects).

Dunno about you @atlight @MusikAnimal but I don't think we need them? We could keep Morebits.wikipedia.namespaces for the sake of backward compatibility(?), I suppose, and just copy wgFormattedNamespaces into it; I certainly think we'd be fine just removing namespacesFriendly altogether.

It's a small step, but should lay the groundwork for the above request. I think there's a little bit in #485 that does this sort of thing, actually.

siddharthvp commented 5 years ago

I don't think Morebits.wikipedia is being used outside Twinkle at all, as the search results show from (apart from outdated copies of Twinkle code). It's only the popular classes like simpleWindow, wiki.page and probably quickform that get used externally.

I suggest Morebits.wikipedia be removed completely.

siddharthvp commented 5 years ago

Regarding the issue at hand here: based on the code given here (which also appears in @Siddhartha-Ghai's TWG), it is easy to derive a generalised function for giving namespace name regexes for any namespace:

function namespaceRegex(namespaceNumber) {
    var namespaceRegex = "";
    for ( var alias in mw.config.get('wgNamespaceIds') ) {
        if ( mw.config.get('wgNamespaceIds')[alias] === namespaceNumber ) {
            if (alias[0].toUpperCase() === alias[0].toLowerCase()) {
                namespaceRegex += alias;
            } else {
                namespaceRegex += '[' + alias[0].toUpperCase() + alias[0] + ']' + alias.slice(1);
            }
            namespaceRegex += '|';
        }
    }
    namespaceRegex = namespaceRegex.slice(0,-1).replace(/_/g,'[ _]');
    return namespaceRegex;
}

(The if (alias[0].toUpperCase() === alias[0].toLowerCase() logic is added in interest of non-latin script languages which don't have upper/lower case characters.)

Regarding @atlight's concern that on-the-fly regex creation logic slows the code, the solution is that we pre-compute these regexes (for the required namespaces) in twinkle.js file and store them in variables, for ready access. eg:

Twinkle.file_ns_rgx = namespaceRegex(6);

Regarding whether these changes should be made here is questionable, but this must be there in any attempted internationalized version of Twinkle.

siddharthvp commented 3 years ago

Resolved in #1262 by @Amorymeltzer. On-the-fly regex creation is being used; don't think the 2012-era concerns of it slowing down the code are applicable any longer as today's JS engines are just so fast.