mediawiki-utilities / python-mwcites

MIT License
38 stars 11 forks source link

Journal-level ISSN pseudo-DOI #16

Open nemobis opened 6 years ago

nemobis commented 6 years ago

Unless I'm mistaken, the recent dump includes a number of identifiers in the form 10.1002/%28ISSN%291099-0690 (escaped parentheses around "ISSN" in the original) which presumably come from templates such as {{Official website|http://onlinelibrary.wiley.com/journal/10.1002/%28ISSN%291099-0690}}.

These identifiers are used as prefixes of the actual DOIs for articles in those papers, but I'm not sure they're real DOIs, and at any rate not in the URL-escaped form. (http://doi.org/10.1002/(ISSN)1099-0690 does redirect to https://onlinelibrary.wiley.com/journal/10990690.)

doi-issn.txt

kodchi commented 6 years ago

I wonder if it would be better to fix the issue at the template level.

nemobis commented 6 years ago

kodchi, 22/08/2018 20:23:

I wonder if it would be better to fix the issue at the template level.

How? The URL or "identifiers" are like that, we cannot change them. As long as mwcites is based on matching DOI-looking strings with regexes (which is a useful approach for now), template semantics will always be lost, won't they?

Ah, I realise you probably meant just the escaping.

kodchi commented 6 years ago

Yeah, why not replace %28 with ( at the core and not deal with it here? Ditto %29.