sagemathinc / cocalc

CoCalc: Collaborative Calculation in the Cloud
https://CoCalc.com
Other
1.16k stars 211 forks source link

Codebase has inconsistencies about directory path assumptions #1330

Closed johnjeng closed 7 years ago

johnjeng commented 7 years ago

In some places we assume that paths are directories iff it ends with /. Other places, we assume that a path not ending in / could still be a directory and we check against the isdir property in the displayed directory listing.

This issue proposes that the front end should assume everywhere that folders end in / and files don't. Some parts of the front end receives folder paths without the trailing / but context lets us convert these cases. This convention is enforced by linux since folder and file names cannot include /.

williamstein commented 7 years ago

This issue proposes that the front end should assume everywhere that folders end in / and files don't.

What's your argument for doing this, instead of the opposite -- namely NEVER putting / at the end of a name? I think the arguments for never ever putting / at the end of paths is much more compelling:

  1. Consistency with UNIX: In command line, scripts (i.e., linux) folders never end in / (e.g., if you are doing something with a pipe or ls, etc.).
  2. Encoding metadata about the path (whether it is a path to a file or folder) in the path name itself is a bad idea, and isdir is just metadata. The metadata should go somewhere else. For example the smc-ls script has several pieces of metadata about paths: filename, size, last change, and whether or not the path is a folder. The latter is just metadata, just like the size and last modification time. You could argue that all the metadata should be encoded in the filename, e.g, "foo/2016-10-11/456/", but that way clearly madness lies.
williamstein commented 7 years ago

@haraldschilly -- very interested in your opinion...

johnjeng commented 7 years ago
  1. Under POSIX specified utilities, using a trailing slash is a kind of type checking requiring that it's a folder.
    • This is nice in certain cases. Example
  2. We have to ask the server if something is a directory.
    • For example when linking to directories in other projects.

After some thought, the opposite of what I'm suggesting is also rather nice:

johnjeng commented 7 years ago

Some more arguments one way or the other. http://unix.stackexchange.com/questions/212805/should-i-use-a-slash-at-the-end-of-path-variables-in-shell-script-or-not

http://stackoverflow.com/questions/980255/should-a-directory-path-variable-end-with-a-trailing-slash

williamstein commented 7 years ago

We have to ask the server if something is a directory.

The exact state (with metainformation) of the directory tree that users care about (e.g., all paths recently visited) should be sync'd between the client and server. This will get implemented eventually. If you implement something that works around this not being implemented, it'll probably just get deleted when this directory tree info sync gets implemented...

williamstein commented 7 years ago

I skimmed the argument you link to. It's interesting that none of them are my argument (2), which is that whether or not the path is a directory is metadata about the path, and it is more natural to treat all metadata the same (namely not trying to encode it in the path). Encoding metadata in names of things is generally considered bad practice (this is, e.g., why labels are used heavily in kubernetes).

williamstein commented 7 years ago

After some thought, the opposite of what I'm suggesting is also rather nice: [...]

Can you elaborate? No matter what, precisely the same information has to travel around. The only difference is where it's located: (a) in the path name or (b) in an attribute of an object.

johnjeng commented 7 years ago

one of them are my argument (2)

Linux already requires storing metadata about the path by not allowing / to appear in names of things. illegal/file/name is certainly name found under some directories and not part of a file name. So on some level encoding metadata is okay (at least for convenience of removing ambiguity).

But you can't have item/ (a directory) and item (a file) in the same directory so removing ambiguity is a little moot for whether or not to end directories in /.

Can you elaborate?

For the first bullet, it means fewer UI components have to worry about directory vs. file. More store/action functions will have to worry about it but it's extremely likely that # components >= # functions.

For the second bullet, I'm thinking that the metadata wouldn't travel around with paths. If something needed to check if a path is a folder or file, it would call some function and that would either look at the appropriate Project Store's file metadata or it would query the server.

Note, right now you would have to parse the path and directory_listing since the latter is not a dictionary. This last note is only relevant if we ever allow certain file tree manipulations outside the files listing (which I think is extremely likely due to courses).

williamstein commented 7 years ago

For the second bullet, I'm thinking that the metadata wouldn't travel around with paths. If something needed to check if a path is a folder or file, it would call some function and that would either look at the appropriate Project Store's file metadata or it would query the server.

That's a good point. So by not recording that a path is a directory by putting a / at the end, less information has to travel around -- the fact that the path is a directory is stored in one place instead of potentially many places.

johnjeng commented 7 years ago

Suppose we assume that no paths end in /.

What UI do we show when loading some path for the user? On master it corresponds to what we expect it to be (for example on page refresh if the url ends with /, then we assume it's a folder). We'd have to fetch the directory listing first before we know. This is just a concern of responsiveness. The end state will be correct, but for example if you open a file from the log, would it go first to the files tab and then open the file?

Edit: Oh, alternatively we can also wait on the store to change so nothing happens until we know what we're loading. So in this case, it's just a little slower until the new UI loads.

williamstein commented 7 years ago

if you open a file from the log

The log could include whether the path is a file or directory. (It's a javascript object for each entry.)

for example on page refresh if the url ends with /, then we assume it's a folder

You have a good point regarding the url; it is very natural for that to continue to end in a slash to indicate it is a directory, since that is a standard convention (I think), and there is no other good way to encode that information.

haraldschilly commented 7 years ago

Hi, after reading this and looking at links, I'm no longer sure who is arguing for what any more. My opinion is, that this has nothing to do with posix, but with URIs and http. It's a convention to denote "directories" with a trailing slash, although in reality they're either an implicit index.html file or dynamically created content. We should keep this consistent in smc, such that all URLs for files end in a slash if they're for directories and 301 redirects for directory names without a slash.

I'm not sure about the code itself, I bet it leads to silly bugs if additional checks are left out in favor of relying on the path name.

I also found to slash or not to slash ... which kind of just says to be consistent but in the end everything is allowed anyways.

williamstein commented 7 years ago

@johnjeng - I think we should just go with @haraldschilly's tie-breaker opinions:

  1. All URLs for files end in a slash if they're for directories (and 301 redirects for directory names without a slash).

  2. Do not leave out additional checks in favor of relying on the path name, in order to avoid silly bugs.

johnjeng commented 7 years ago

To check, the specs are for some path P:

  1. If P ends in a /, assume it's a directory
    • An error is fine here if it's not a directory
  2. If P doesn't end in a /, find out if it is a directory or file before changing the UI.