semantalytics / xsparql

14 stars 4 forks source link

list of URL schemes is incomplete #4

Open VladimirAlexiev opened 7 years ago

VladimirAlexiev commented 7 years ago

A fully specified URI starting with <ftp://...> is emitted without the brackets, which causes RIOT error prefix ftp: is not defined. URIs starting with <http://...> don't have this defect. So the list of URI schemes that xsparql knows about is incomplete. Check that it also covers mailto: etc

Workaround: xsparql-postprocess.pl:

#!perl -p
BEGIN {print "\@base <whatever you need here>\n"}
s{(ftp://[^ ]+)}{<$1>}g;
VladimirAlexiev commented 7 years ago

I think the bug is here:

These enumerate only 5 URI schemes: tel: http: https: mailto: file:, but http://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml has 255 schemes, of which 90 are Permanent.

Questions:

  1. are there other source locations to be fixed?
  2. should we include the 90 Permanent, or all 255 IANA-registered schemes?
  3. are there other non-registered schemes that are used nevertheless, and we need to include?
  4. are we afraid some of these schemes may conflict with prefixes (see Ambiguities Between CURIEs and URIs)? In this case I think we should check for a following slash
  5. which schemes must be followed by slash? Eg [tftp|https://tools.ietf.org/html/rfc3617] must. Do we need to check each RFC separately, or is there somewhere a table with this fact? I'm surprised there's no list in wikipedia: https://en.wikipedia.org/wiki/Uniform_Resource_Locator. There is this list: https://www.w3.org/wiki/UriSchemes but the info about slashes is not readily available

Furthermore: