qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

fn:parse-uri: Observations #1320

Closed ChristianGruen closed 3 days ago

ChristianGruen commented 1 month ago

@ndw I decided to only give feedback on the first test that fails; maybe it makes things easier. Next, I chose this repository (instead of qt4tests), as you are the better person to judge if bugs are to be fixed in the tests or the spec.

The test case fn-parse-uri-012 uses fn:parse-uri("file:///c:/path/to/file") and returns:

{
  "uri": "file:///c:/path/to/file",
  "scheme": "file",
  "hierarchical": true(),
  "path": "/c:/path/to/file",
  "filepath": "c:/path/to/file",
  "path-segments": ("", "c:", "path", "to", "file")
}

Following the current rules, I would have expected filepath to be /c:/path/to/file:

If URI matches `^([a-zA-Z][A-Za-z0-9\+\-\.]*):(.*)$`:
• SCHEME: file
• STRING: ///c:/path/to/file

If the scheme is known to be file and the string matches "^/*(/[a-zA-Z][:|].*)$":
• STRING: /c:/path/to/file

If the scheme is file or the empty sequence, and filepath is the empty sequence, filepath is also the whole string:
• FILEPATH: /c:/path/to/file

Would you propose to revise the test or the specification?

ChristianGruen commented 1 month ago

Since I didn’t consider #1244 in my feedback, I’ll look at the new changes and comment back.

ChristianGruen commented 1 month ago

Hi @ndw, sorry for letting you wait. Here’s finally some feedback:

I think my previous observations still apply for fn-parse-uri-012, which uses the function call fn:parse-uri("file:///c:/path/to/file") and returns:

{
  "uri": "file:///c:/path/to/file",
  "scheme": "file",
  "hierarchical": true(),
  "path": "/c:/path/to/file",
  "filepath": "c:/path/to/file",
  "path-segments": ("", "c:", "path", "to", "file")
}

Here are the (slightly adjusted) steps:

If URI matches `^([a-zA-Z][A-Za-z0-9\+\-\.]+):(.*)$`:
• SCHEME: file
• STRING: ///c:/path/to/file

If scheme is the empty sequence or file:
- If the string matches `^/*([a-zA-Z][:|].*)$`:
• STRING: /c:/path/to/file

Otherwise, the filepath and string are the string with any sequence of leading slashes replaced by a single slash:
• FILEPATH: /c:/path/to/file

Can you find out for me what is wrong?

  1. the spec?
  2. the test/the Saxon implementation (in case they should match)?
  3. my interpretation of the spec?

Next, I noticed that some “otherwise” adverbs should be added to the rule set. For example, the following rules imply that the third step should always be performed if the second step condition fails:

ndw commented 3 weeks ago

I think the spec needs an additional rule. The path /C:/... isn't what users will want or expect on a Windows system. I propose:

  • If the scheme is file:
    • The authority is the empty sequence.
    • If unc-path is true and the string matches ^/*(//[^/].*)$: then filepath, and string are both the first match group.
    • If the string begins ^//*[A-Za-z]:/ then all but one leading slash is removed from string and the filepath is the string with all leading slashes removed.
    • Otherwise, the filepath and string are the string with any sequence of leading slashes replaced by a single slash.

Which is apparently what I implemented but failed to document. My bad. Apologies.