openknowledge-archive / datapackage-identifier-js

NodeJS library for handling Data Package Ident(ifier)s or Ident(ifier) Strings.
http://dataprotocols.org/data-package-identifier/
4 stars 1 forks source link

Inconsistant URLs #8

Open Hypercubed opened 7 years ago

Hypercubed commented 7 years ago

6 solved the issue of relative or absolute POSIX paths being inadvertently converted to contain Windows path separators by path.resolve. However, inconsistency issues still remain in the current version.

I ran several relative and absolute paths through datapackage-indentifier on both OSX and Windows. The parsing code is:

const identifier = require('datapackage-identifier');
const id = input => identifier.parse(input).dataPackageJsonUrl;

The inputs and expected results are:

input POSIX Expected Win32 Expected
datasets/gdp/ file:///Users/test/datasets/gdp/datapackage.json file:///C:/Users/test/datasets/gdp/datapackage.json
datasets\gdp\ file:///Users/test/datasets/gdp/datapackage.json file:///C:/Users/test/datasets/gdp/datapackage.json
/datasets/gdp/ file:///datasets/gdp/datapackage.json file:///datasets/gdp/datapackage.json
C:\datasets\gdp\ file:///C:/datasets/gdp/datapackage.json file:///C:/datasets/gdp/datapackage.json
/C:/datasets/gdp/ file:///C:/datasets/gdp/datapackage.json file:///C:/datasets/gdp/datapackage.json

Note that I expect relative paths (rows 1 and 2) will vary between systems. I expect consistent results, regardless of system, for absolute paths (rows 3, 4, 5). Also, I always expect a valid browser-compatible file:/// URL (with forward slashes).

OSX result using datapackage-identifier v0.4.2:

input POSIX Expected v0.4.2 OSX
datasets/gdp/ file:///Users/test/datasets/gdp/datapackage.json file:///Users/test/datasets/gdp/datapackage.json
datasets\gdp\ file:///Users/test/datasets/gdp/datapackage.json /datapackage.json *
/datasets/gdp/ file:///datasets/gdp/datapackage.json file:///datasets/gdp/datapackage.json
C:\datasets\gdp\ file:///C:/datasets/gdp/datapackage.json /datapackage.json *
/C:/datasets/gdp/ file:///C:/datasets/gdp/datapackage.json file:///C:/datasets/gdp/datapackage.json

Relative and absolute paths with Windows path separator (rows 2 & 4) fail on OSX.

Windows results using datapackage-identifier v0.4.2 and v0.4.1:

input Expected Windows v0.4.2 Windows v0.4.1 Windows
datasets/gdp/ file:///C:/Users/test/datasets/gdp/datapackage.json file://C:\Users\test\datasets\gdp\datapackage.json * file://C:\Users\test\datasets\gdp\datapackage.json *
datasets\gdp\ file:///C:/Users/test/datasets/gdp/datapackage.json /datapackage.json * /datapackage.json *
/datasets/gdp/ file:///datasets/gdp/datapackage.json file:///datasets/gdp/datapackage.json file://C:\datasets\gdp/datapackage.json *
C:\datasets\gdp\ file:///C:/datasets/gdp/datapackage.json /datapackage.json * /datapackage.json *
/C:/datasets/gdp/ file:///C:/datasets/gdp/datapackage.json file:///C:/datasets/gdp/datapackage.json file://C:\C:\datasets\gdp/datapackage.json *

The POSIX style relative paths (row 1) return an invalid URL. As seen on OSX, relative and absolute paths with Windows path separators (rows 2 & 4) also fail on windows. The v0.4.2 release fixes the issue with absolute paths with POSIX path separators (rows 3 and 5).

Possible solution

sindresorhus/file-url uses path.resolve then converts all paths to use POSIX path separators using String.prototype.replace.

Hypercubed commented 7 years ago

One more issue that could be addressed. URLs should be URI encoded. For example:

file:///C:/Program%20Files/gdp/datapackage.json not file:///C:/Program Files/gdp/datapackage.json

rufuspollock commented 7 years ago

I'm not sure that you would pass a simple file path in data-package-identifier-js - it's more for "identifiers" as defined by the identifier spec: http://specs.frictionlessdata.io/data-package-identifier/

In general, one would not use the identifier package to general local file urls.

Hypercubed commented 7 years ago

If that is the case then this whole section can away: https://github.com/frictionlessdata/datapackage-identifier-js/blob/master/index.js#L28

Personally, I am using this feature here: https://github.com/Hypercubed/chi-datapackage

Hypercubed commented 7 years ago

I performed the same tests in the browser using browserify. Unfortunately, my "fix" in #6 causes an error (path.posix is not supported in the browserify shim). I can file this as another issue if you want to support this use (simple paths in the browser).

Here are the results using datapackage-identifier v0.4.1 in the browser:

Input Browser Expected Browser Results
datasets/gdp/ file:///datasets/gdp/datapackage.json file:///datasets/gdp/datapackage.json
datasets\gdp\ file:///datasets/gdp/datapackage.json /datapackage.json
/datasets/gdp/ file:///datasets/gdp/datapackage.json file:///datasets/gdp/datapackage.json
C:\datasets\gdp\ file:///C:/datasets/gdp/datapackage.json /datapackage.json
/C:/datasets/gdp/ file:///C:/datasets/gdp/datapackage.json file:///C:/datasets/gdp/datapackage.json
/data sets/gdp/ file:///data%20sets/gdp/datapackage.json file:///data sets/gdp/datapackage.json

Row 6 shows the lack of URI encoding.

Edit: path.posix also fails when using webpack.

Hypercubed commented 7 years ago

Another inconsistency I found is that while datapackage-identifier can generate file:// and http:// URIs and consume http:// URIs it will not accept file:// URIs as input.

For example:

/datasets/gdp/ -> file:///datasets/gdp/datapackage.json
http://datasets/gdp/datapackage.json -> http://datasets/gdp/datapackage.json
file://datasets/gdp/datapackage.json -> /datapackage.json
Hypercubed commented 7 years ago

I'm willing to add tests and fixes for all of these... if you confirm your desired behavior.

Hypercubed commented 7 years ago

The way I understand this package it has two functions:

A. Process the three types of Identifier Strings as listed in the Data Package Identifiers document:

  1. GitHub URLs

    Work fine.

    GitHub URLs are converted raw.githubusercontent.com URLs.

  2. Other URLs

    These work fine:

    a. http:// URLs that points directly to the datapackage.json

    b. http:// URLs that point to a path. Adds the datapackage.json file name.

    This does not work:

    c. Other URIs including file:// URIs. The should probably be treated the same as http:// URLs.

  3. Names of a dataset in the Core Datasets registry.

    Currently not implemented.

B) Relative and absolute paths

Inconsistent

Currently these are resolved on the local system then converted to file:// URIs. This is where I see inconsistent behavior discussed above.

If someone can clarify the expected behavior I can, and am willing, to work on these.