Closed subbuss closed 8 years ago
I suspect this is probably a Parsoid-internal issue with how mediawiki-title is being used.
Well, it is indeed a mediawiki-title bug. The diff below seems to fix it. It is late and this needs a second look, so, can pick this up tomorrow. /cc @arlolra
[subbu@earth mediawiki-title] git diff
diff --git a/node_modules/mediawiki-title/lib/index.js b/node_modules/mediawiki-title/lib/index.js
index 3a185b8..3bf8349 100644
--- a/node_modules/mediawiki-title/lib/index.js
+++ b/node_modules/mediawiki-title/lib/index.js
@@ -213,7 +213,7 @@ function _splitNamespace(title, siteInfo, defaultNs) {
var match = title.match(prefixRegex);
if (match) {
var namespaceText = match[1];
- var ns = Namespace.fromText(namespaceText, siteInfo);
+ var ns = (defaultNs !== undefined) ? defaultNs : Namespace.fromText(namespaceText, siteInfo);
if (ns !== undefined) {
return {
title: match[2],
That diff looks reasonable to me. We'll see if arlo agrees.
Example test case: "Category:Project:Foo"
That doesn't look right. Why is K being passed in as the default?
Looking at all the changes, the bug is in interpretation of the 'defaultNs' parameter. If mediawiki-title is treating that arg as a default to be used when one cannot be inferred, then the bug is in https://gerrit.wikimedia.org/r/#/c/299897/2/lib/config/MWParserEnvironment.js ... the 'ns' parameter is not a "default" namespace parameter, but the namespace that needs to be used with the title.
mediawiki-core has https://github.com/wikimedia/mediawiki/blob/master/includes/Title.php#L249-L252 to support this functionality.
So, there are multiple fixes possible here:
Of these, (1) or (2) seems simpler than (3).
the bug is in interpretation of the 'defaultNs' parameter.
That link to core says, "The namespace to use if none is specified by a prefix." Agreed that the bug is in Parsoid, and it looks like I introduced it in, https://gerrit.wikimedia.org/r/#/c/299897/2/lib/config/MWParserEnvironment.js
But, I think (2) is the right thing to do.
That link also indicates that title construction might need to construct something with an explicit NS for which it provides additional methods .. hence option (3). But, okay, I'll take a shot at (2) then.
Yeah, that's exactly what you were doing with,
var title = Title.newFromText(urlDecodedText, this.conf.wiki.siteInfo);
if (ns !== undefined && ns !== 0) {
title = new Title(title.getKey(), ns, title._siteInfo, title._fragment);
}
return title;
which is why I said I introduced in the above change.
But, okay, I'll take a shot at (2) then.
Or you can revert my change and put a comment about that not just being a default, as I misinterpreted.
Yes, I considered reverting it (since I verified that it works earlier today). As it turns out, there is only one use of the explicit NS scenario (the one that broke). I can fix that easily. Let me push fix for (2) and we can discuss on that patch what is a better solution.
But, the broader question is whether mediawiki-title, the library, needs functionality described in (3). It may, but probably better to introduce it on demand.
It may, but probably better to introduce it on demand.
Agreed.
I do need the default NS behaviour, as implemented here, for L71, https://gerrit.wikimedia.org/r/#/c/264026/21/lib/ext/Gallery/index.js
Parsoid is parsing [[К:Википедия: статьи без изображений (тип: судно)]] as a link after switching to mediawiki-title instead of a category as it used to before.
Thanks to a recently introducing bug in dumping error output (but fixed in https://github.com/wikimedia/parsoid/commit/c77dbee5fb964eea341642dc1aafa832862f09fa), led to crashers in roundtrip testing (https://parsoid-tests.wikimedia.org/resultFlagNew/45beb6c0abc5a313ad75259053d74466a51f91cd/4a5a7ae97bd2b4c6fa70f62c99d2e959d072c65f/ruwiki/HMS_Actaeon_(1775)). I was able to trace the failure in template wrapping to a bad parse of the above wikitext.
This is because https://github.com/wikimedia/parsoid/blob/master/lib/wt2html/tt/LinkHandler.js#L283 is no longer true. The namespace id is 4 instead of 14. However, siteinfo dump shows
"-2":{"id":-2,"case":"first-letter","canonical":"Media","":"Медиа"},"-1":{"id":-1,"case":"first-letter","canonical":"Special","":"Служебная"}},"namespacealiases":[{"id":2,"":"U"},{"id":2,"":"У"},{"id":2,"":"Участница"},{"id":3,"":"UT"},{"id":3,"":"ОУ"},{"id":3,"":"Обсуждение участницы"},{"id":4,"":"Wikipedia"},{"id":4,"":"ВП"},{"id":6,"":"Image"},{"id":6,"":"Изображение"},{"id":7,"":"Image talk"},{"id":7,"":"Обсуждение изображения"},{"id":10,"":"T"},{"id":10,"":"Ш"},{"id":14,"*":"К"}
which means this should have been parsed as 14 which would then have been recognized as a category link.