subugoe / hoad

Deprecated: Please check https://github.com/subugoe/hoaddash
https://github.com/subugoe/hoaddash
GNU Affero General Public License v3.0
15 stars 4 forks source link

test and document redirected license historical URLs #215

Open maxheld83 opened 4 years ago

maxheld83 commented 4 years ago

Some license info URLs such as http://olabout.wiley.com/WileyCDA/Section/id-815641.html are now redirected to another page. In this case, the website you're being redirected to is so general, that it can no longer be taken as an indication for an open license.

Ah, linkrot, our old foe.

This raises some questions / todos:

Apologies if this is already completely covered by some other plan or data source concerning license patterns.

jhoeffler commented 4 years ago

According to my experience with rules on data availability (Höffler, Jan H. 2017. "Replication and Economics Journal Policies." American Economic Review, 107 (5): 52-55. DOI: 10.1257/aer.p20171032) publishers and their journals change their rules frequently. Among the license URLs we identified there are pages like https://www.cambridge.org/core/terms for which the data of the last update is indicated. It would not be sufficient to screen all the hundreds of licenses already identified and regularly look up which new license URLs are used (up to 58 in just one year in one of the datasets we use) and as noted above check which of these pages still exist or whether and if yes where they redirect and since when. One would also have to look at changes of license information on known pages. The Internet Archive https://web.archive.org/web/20190107085013/https://www.cambridge.org/core/legal-notices/terms can help to identify different versions but going through this for so many licenses is very tedious and not all changes are always stored. On top of that, how do we know which version the publishers actually meant to refer to if they sometimes deposit licence information years after articles are published?