Closed kelson42 closed 5 years ago
This bug is weird.
The behaviour is to use mwMetadata.langISO2
in the file name.
When we retrieve the wiki language, PDC
is ISO3 not 2, which means we don't have an ISO2 language. The language conversion library we use doesn't recognise PDC
as a language, so we switch to use the fallback of de
.
I've proposed a change #625 which reverts back to the old logic:
Set mwMetadata.langISO2
to be whatever language is returned, even if it's not an ISO2 code.
This now makes mwoffliner behave the same way it did, but this is not correct. At some point we should either switch to using ISO3 codes in the filenames, or find another solution to this.
Thoughts @kelson42? I don't know what could break by changing the filename
@ISNIT0 metadata is iso3 and filename is iso2. So change #625 should not be implemented. You should come back to the 1.7 or even 1.6 logic.
But you said that you want pdc
in the file name. This is not iso2
This change brings it back to 1.7
logic
@ISNIT0 I forgot to say: if there is no iso2, then write iso3. pdc
is in that case, de
is not the iso2 version of pdc
. But I think you should not try to reinvent something, just rollback to the previous logic. It was correct.
Here is the logic in 1.5, this does not seem to work the same manner in 1.8
/* Language */
env.zim.langIso2 = entries.lang;
countryLanguage.getLanguage(env.zim.langIso2, function (error, language) {
if (error || !language.iso639_3) {
env.zim.langIso3 = env.zim.langIso2;
} else {
env.zim.langIso3 = language.iso639_3;
}
cb();
});
I've implemented and merged this change
For example, scraping
https://pdc.wikipedia.org
should generate file likewikipedia_pdc_all_nopic_2019-02.zim
but now it generates a file likewikipedia_de_all_nopic_2019-02.zim
This needs to be fixed in a 1.8 maintenance release