Open Daniel-Mietchen opened 10 years ago
The paper at http://dx.doi.org/10.1186/1742-4690-2-11 has a colon in the title that was not brought over to https://en.wikisource.org/wiki/Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central/The_Vpr_protein_from_HIV-1_distinct_roles_along_the_viral_life_cycle . I would be inclined to keep the colon (and did so in a manual move of a previously imported version to https://en.wikisource.org/wiki/Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central/The_Vpr_protein_from_HIV-1:_distinct_roles_along_the_viral_life_cycle ), but I am not entirely sure what is policy or practice at Wikisource on this.
Seems like a fringe case, the colons are considered a "forbidden character" by OAMI and we copied the title cleaning function from there. You can see it here.
Also, we should currently be keeping all Unicode characters, and simply eliminating a small number of "forbidden characters". I'll update the title to reflect this.
The OAMI rules were set up with Commons in mind, and I think we should leave our Commons-facing naming rules like this until the time we can pull all this info from Wikidata.
For Wikisource, I'd agree that we should just exclude "forbidden characters" and keep everything else.
At OAMI, the file naming of the uploads to Commons gets rid of many special characters. At Wikisource, we should strive to keep the paper titles as intact as possible (see also https://github.com/wpoa/recitation-bot/issues/15 ), taking into account technical limitations of MediaWiki (e.g. colons or slashes in page names).