Enhanced URL system - Githubissues

btrask commented 11 years ago

The URL format currently used is unable to distinguish the project root from sub-folders that contain the file being edited.

Current form: http://localhost:7261/editor/path/on/file/system.txt

Proposed form: http://localhost:7261/editor/[pseudo-random hash]/subpath/file.txt

The advantages of using a hash to represent the project root are numerous:

The entire file system is not exposed to browser access (only specific roots identified by hashes are)
If hashes of reasonable length are used, then security is improved because guessing valid URLs is infeasible
If hashes are salted, the user can "invalidate" old URLs (by changing the salt) while still providing access to the same files under new URLs
(Most important) The web page is always able to know what directory it should use as the root (currently, if you reload the page, the root changes to the active file's parent folder)

The only downside is that the server needs to maintain a persistent list of "opened" hash->root path mappings. The upside is that this database is controlled by the server and mappings can be removed or modified.

I've been using this URL scheme in my browser-based image viewer, Sequential 3, with great success. The hashes it uses are salted SHA1s of the directory path (so if the same directory is reopened, the same URL produced), encoded in a way to make them short and somewhat human-readable. The mappings are stored as flat files (file name = hash, content = path) in a configuration directory. On OS X, it also stores "alias" data so that if a root is moved/renamed, it can still be found.

This is sort of an extension of https://github.com/scripted-editor/scripted/pull/17.

(I didn't see the documentation about marking project roots with .scripted when I first wrote this, but I believe my suggestion is more flexible and offers additional advantages, as explained above.)

aclement commented 11 years ago

We have an ongoing activity to improve the URL scheme, and part of it is along the lines you describe here (in our variant we are using pseudo-random-hash but no longer have the 'editor' component in the URL). I will post the proposal in a more public place soon so everyone knows where we are heading and can comment. But beyond the design discussion we haven't had any time to do any implementation yet.

btrask commented 11 years ago

Sounds great. I'll look forward to seeing the proposal. Thanks.

aclement commented 11 years ago

Here we go, this is the proposal we were batting around, written by Scott Andrews:

After studying the HTTP interactions within Scripted and how the server and client are structured to handle them, there are opportunities to be realized. The current collection of server side endpoint seem haphazard and overly command based instead of being resource based. The REST model works really, really well for loading and saving files; there's no reason not to embrace REST (with a little WebSockets on the side).

Where there are operations that occur server-side on a resource, instead of treating them as a separate endpoint, they are instead treated as an alternate representation of the same resource. For example, when loading a path, it could represent a file, or a directory. The requestor should not need to care what the underlying resource is, but instead distinguish the response by a MIME type. Additionally, the raw file content can be distinguished from the linter output for that file. The HTTP 'Accepts' and 'Content-Type' headers work well for this purpose. The 'Accepts' header indicates the preferred 'Content-Type' of the response.

One serious issue I have with the current API, is that there are no restrictions preventing the editor from loading and editing any file on the file system. http://localhost:7261/editor.html?/etc/hosts is a completely legitimate file to edit, as is http://localhost:7261/editor.html?/Users/aclement/.ssh/id_rsa. While the browser will restrict direct access from a malicious website, if there are any cross site scripting XSS vulnerabilities, the browser can be tricked into accessing and loading the file content. While I'd like to think we won't have any XSS in our code, they are far too common in the wild and the stakes of local file system access too high.

This proposal introduces the concept of a secure, random token that essentially defines a chroot jail. The user has full access to files under the jailed directory, but is prevented from accessing or modifying files outside that directory. (How to limit access of exec commands is an open question). When the user starts a scritped process, a random token is generated representing the project root. Only with that token can the editor load and save files.

While this proposal strives to follow REST best practices there are often accommodations that must be made for human vs machine interactions. In this case, the API is biased towards human interactions and de-normalized machine interactions. I don't mean this to be "the way", but rather the beginning of a discussion.

GET http://localhost:7261/

Scripted hello world, overview

GET http://localhost:7261/resources/**

Static resources, shared across all projects

GET http://localhost:7261/{secureRandom}/{filePath}

Bootstrap page for the project editor

Path segments: - secureRandom: unguessable token that maps to a location on the file system - filePath: [optional] file within the project to open by default

Response Codes: - 200: the editor bootstrap - 403: the secure random is unknown/untrusted

GET http://localhost:7261/files/{secureRandom}/{filePath}

Load a file resource

Path segments: - secureRandom: unguessable token that maps to a location on the file system - filePath: the path on the file system relative to it's container

Content-Types: - application/vnd.scripted.raw: raw file content - application/vnd.scripted.directory: directory listing - application/vnd.scripted.lint: linter output for the resource - application/vnd.scripted.dependencies: list of dependent resources

Response Codes: - 200: file content - 403: secure random is unknown/untrusted - 404: file does not exsist

PUT http://localhost:7261/files/{secureRandom}/{filePath}

Save a file resource

Path segments: - secureRandom: unguessable token that maps to a location on the file system - filePath: the path on the file system relative to it's container

Content-Types: - application/vnd.scripted.raw: the raw file content

Response Codes: - 201: file saved - 403: secure random is unknown/untrusted - 409: file has changed on disk since it was loaded

DELETE http://localhost:7261/files/{secureRandom}/{filePath}

Delete a file within a project

Response Codes: - 204: deleted - 403: secure random is unknown/untrusted - 404: file does not exsist - 409: file has changed on disk since it was loaded

GET http://localhost:7261/preferences/{secureRandom}

The project preferences

Path segments: - secureRandom: unguessable token that maps to a location on the file system

Content-Types: - application/vnd.scripted.preferences: scripted preferences

Response Codes: - 200: preferences - 403: secure random is unknown/untrusted

GET http://localhost:7261/commands/{secureRandom}

List of available commands to run for the project

Path segments: - secureRandom: unguessable token that maps to a location on the file system

Content-Types: - application/vnd.scripted.commands: project commands list

Response Codes: - 200: commands - 403: secure random is unknown/untrusted

POST http://localhost:7261/commands/{secureRandom}/{command}

Execute the desired command. The response should be chunked to give the user the most up to date console output. WebSockets may also be appropriate.

Path segments: - secureRandom: unguessable token that maps to a location on the file system - command: the command to execute

Content-Types: - application/vnd.scripted.console: console output from the command execution

Response Codes: - 200: commands - 403: secure random is unknown/untrusted - 404: the command is undefined

(http|ws)://localhost:7261/events/{secureRandom}

Reserved for future use

Possible uses: - presence detection - notification of file system changes - realtime peer collaberation

Path segments: - secureRandom: unguessable token that maps to a location on the file system

btrask commented 11 years ago

Thanks for sharing it, and so quickly. Overall it looks solid, and I'd be happy to see it used as-is. That said... ;-)

I think using the Accepts header to specify the type of resource is a poor fit. Each resource has a real content type, which is what the Content-Type field should be used for.

I considered using Accepts in the case of my image viewer mentioned above, but it isn't possible to specify headers when loading images from the <img> tag. That doesn't apply to Scripted, of course, but there are plenty of other cases, like viewing resources in the WebKit debugger, or offering a "raw file" view (like GitHub does).

I think the split between /{hash}/path and /file/{hash}/path is similar. You can't control the browser's Accepts header, so you've given them two separate paths, but the HTML editor really is a view of "the resource" itself, and they should ideally have the same path.

What I ended up doing in that case was using query parameters. Advantages:

They work no matter how the resource is being requested
They can be passed around as regular links
They let multiple views of the same file share the same path

For example, how GitHub does it: https://github.com/scripted-editor/scripted/blob/master/README.md https://github.com/scripted-editor/scripted/raw/master/README.md

How I'd propose doing it: https://github.com/scripted-editor/scripted/master/README.md (no type specified, uses HTML viewer as default) https://github.com/scripted-editor/scripted/master/README.md?view=raw

But this is just aesthetics, and I'd be perfectly happy with whatever the Scripted team chooses.

I'd also suggest namespacing the hashes in the path, to avoid collisions with other resources and for forward-compatibility. For example, /id/{hash} instead of just /{hash}. However, it's not necessary if your hashes have a fixed length and you can make sure no other resources will have names of the same length with the same possible character set (if you want to have /resources/, don't let your hashes be 9 characters long).

Regarding the hashes themselves, YouTube uses 11 characters base-64 encoded, with two characters (+ and /) replaced with - and _. However, base-64 is case sensitive, which can cause problems if you ever decide to represent a hash in the file system (some file systems are case sensitive, others aren't). I don't really have a suggestion here—there's tradeoffs whichever way you go.

Thanks again, Ben

vmware-archive / scripted

Enhanced URL system #160