Open Hypercubed opened 7 years ago
One more issue that could be addressed. URLs should be URI encoded. For example:
file:///C:/Program%20Files/gdp/datapackage.json
not file:///C:/Program Files/gdp/datapackage.json
I'm not sure that you would pass a simple file path in data-package-identifier-js - it's more for "identifiers" as defined by the identifier spec: http://specs.frictionlessdata.io/data-package-identifier/
In general, one would not use the identifier package to general local file urls.
If that is the case then this whole section can away: https://github.com/frictionlessdata/datapackage-identifier-js/blob/master/index.js#L28
Personally, I am using this feature here: https://github.com/Hypercubed/chi-datapackage
I performed the same tests in the browser using browserify. Unfortunately, my "fix" in #6 causes an error (path.posix
is not supported in the browserify shim). I can file this as another issue if you want to support this use (simple paths in the browser).
Here are the results using datapackage-identifier v0.4.1 in the browser:
Input | Browser Expected | Browser Results |
---|---|---|
datasets/gdp/ | file:///datasets/gdp/datapackage.json | file:///datasets/gdp/datapackage.json |
datasets\gdp\ | file:///datasets/gdp/datapackage.json | /datapackage.json |
/datasets/gdp/ | file:///datasets/gdp/datapackage.json | file:///datasets/gdp/datapackage.json |
C:\datasets\gdp\ | file:///C:/datasets/gdp/datapackage.json | /datapackage.json |
/C:/datasets/gdp/ | file:///C:/datasets/gdp/datapackage.json | file:///C:/datasets/gdp/datapackage.json |
/data sets/gdp/ | file:///data%20sets/gdp/datapackage.json | file:///data sets/gdp/datapackage.json |
Row 6 shows the lack of URI encoding.
Edit: path.posix
also fails when using webpack.
Another inconsistency I found is that while datapackage-identifier
can generate file://
and http://
URIs and consume http://
URIs it will not accept file://
URIs as input.
For example:
/datasets/gdp/ -> file:///datasets/gdp/datapackage.json
http://datasets/gdp/datapackage.json -> http://datasets/gdp/datapackage.json
file://datasets/gdp/datapackage.json -> /datapackage.json
I'm willing to add tests and fixes for all of these... if you confirm your desired behavior.
The way I understand this package it has two functions:
A. Process the three types of Identifier Strings as listed in the Data Package Identifiers document:
GitHub URLs
Work fine.
GitHub URLs are converted raw.githubusercontent.com URLs.
Other URLs
These work fine:
a. http://
URLs that points directly to the datapackage.json
b. http://
URLs that point to a path. Adds the datapackage.json
file name.
This does not work:
c. Other URIs including file://
URIs. The should probably be treated the same as http://
URLs.
Names of a dataset in the Core Datasets registry.
Currently not implemented.
B) Relative and absolute paths
Inconsistent
Currently these are resolved on the local system then converted to file://
URIs. This is where I see inconsistent behavior discussed above.
If someone can clarify the expected behavior I can, and am willing, to work on these.
6 solved the issue of relative or absolute POSIX paths being inadvertently converted to contain Windows path separators by
path.resolve
. However, inconsistency issues still remain in the current version.I ran several relative and absolute paths through
datapackage-indentifier
on both OSX and Windows. The parsing code is:The inputs and expected results are:
Note that I expect relative paths (rows 1 and 2) will vary between systems. I expect consistent results, regardless of system, for absolute paths (rows 3, 4, 5). Also, I always expect a valid browser-compatible
file:///
URL (with forward slashes).OSX result using datapackage-identifier v0.4.2:
Relative and absolute paths with Windows path separator (rows 2 & 4) fail on OSX.
Windows results using datapackage-identifier v0.4.2 and v0.4.1:
The POSIX style relative paths (row 1) return an invalid URL. As seen on OSX, relative and absolute paths with Windows path separators (rows 2 & 4) also fail on windows. The v0.4.2 release fixes the issue with absolute paths with POSIX path separators (rows 3 and 5).
Possible solution
sindresorhus/file-url uses
path.resolve
then converts all paths to use POSIX path separators usingString.prototype.replace
.