nushell / nushell

A new type of shell
https://www.nushell.sh/
MIT License
32.15k stars 1.66k forks source link

Paths as Datatype #10679

Open stellarpower opened 1 year ago

stellarpower commented 1 year ago

Related problem

Most languages work with paths as strings, and whilst some offer higher-level support, often these languages fall back to simply performing string concatenation or providing no guarantee on proper path support, leaving that up to the user-programmer to manage themselves.

This can cause various problems. The literal path to an entry on the filesystem (/foo/bar/baz) is quite different from what we care about programmatically (I expect baz to be a directory, and not a symlink, where I have write permissions, because I am about to write some files into there). The latter ends up being specified implicitly, or by convention, and that's dangerous.

Coming from other shell scripting languages, I have two options when I am writing my functions:

Describe the solution you'd like

Nu already gives really nice error messages for type errors, and these bubble through because it's compiled before execution. Builtins like save are safe by default, because they don't overwrite unless you force them to - the exact opposite of most other shells. So this is a great starting point.

But I think a dedicated path datatype could help integrate these into the language further, given that working with files is such an integral part of operating in any shell. It could help portability by enforcing and making transparent proper handling of differences across platforms. It helps specify what elements in an external environment a script needs in order to be able to run, before we get to the point that we want to use them.

The main thing that comes to mind however, is it would be really cool if I could use path type annotations for my functions, in a similar way it seems I can for specifying valid records based on their fields. Rather than having something that is string-like, I could specify precisely what I need and expect to be on the filesystem, before I start trying to open it or use it. Thus eliminating the problem mentioned above of performing manual checks. This would ensure my scripts are safe, self-documenting, I get back a clear error message outlining the difference between what I asked and what I should have asked, and most importantly, fail as early as possible - critical for speeding up development and making the whole process easier. As always, sanitise user input at the first point it enters the system.

Let's say I want to have a file (needle), and I want to find duplicates of that in a directory (haystack), and output the location of those duplicates to a third file (results). I am picturing something like this:

def findDuplicates[
    needle :      path<exists: true,  type: file,      canRead:  true>,
    haystack:     path<exists: true,  type: directory                >,
    resultsFile:  path<exists: maybe, type: file,      canWrite: true>
] {}

Now any misuse of this function (whoops! I got needle and results the wrong way round) would produce a really clear error message - before anything inside the function had even run yet. It would make it clear what the problem is (I gave bad inputs) rather than distracting me by concentrating on the error that was thrown at the use site (this element halfway through a pipeline was unable to write a string into a directory, or open a file owned by root).

Describe alternatives you've considered

Not thought of any so far. I know the name pathwould already clash with the command.

Additional context and details

Also, I believe there are some issues with sourcing files according to a glob pattern, because these aren't CT constants. It might be possible that an integrated path datatype could get round some of these issues, by making some of these activities into things we can know (or find out) at compile-time, in a way that operating with strings etc. we couldn't in a way that would be reproducible.

Any thoughts?

Thanks

fdncred commented 1 year ago

Thanks for your input!

We have SyntaxShape::Filepath, FlatShape::Filepath, Expr::Filepath that allows the name path to be used when you create custom commands with that shape. This shape is also used in commands like cp, mv, open, rm, save, etc.

โฏ def fp [abc: path] { echo $abc }
โฏ fp foo
/Users/fdncred/src/nushell/foo

I doubt it does all the checking you're talking about, but it's a start.

A long time ago we had a Value::Filepath but it proved to be difficult to use. As I recall, there were always times when you wanted a string and got a filepath and vice-versa. I believe we left it behind when we went from 0.44 to 0.59.

amtoine commented 1 year ago

to me, the most surprising is

> let p: path = $nu.temp-path
> $p | describe
string
amtoine commented 1 year ago

@stellarpower this issue looks quite related to https://github.com/nushell/nushell/issues/10332, right? there is a section about path in there

fdncred commented 1 year ago

to me, the most surprising is

should this work like a cast?

I ask because $nu.temp-path | describe also returns string.

I wonder if part of the issue is that there is no Filepath literal syntax. ๐Ÿค” e.g. You can't make a path like you can a string.

let a = "string"
let a = `string` # <-- this pukes because it tries to run string
let a = 'string' # that's a string
let a = "string" # that's a string

maybe let should allow this type of casting?

Maybe this is what Stefan was talking about because there is no Type::Filepath?

amtoine commented 1 year ago

i guess $nu.temp-path should be a path directly, but yeah let p: path = ... could act like a cast :yum:

if you're in /a/b/c, then

let p: path = "d"

could give /a/b/c/d :smirk:

fdncred commented 1 year ago

since let is a parser keyword, i guess the changes need to happen there. I'm looking but nothing is jumping out at me to change.

fdncred commented 1 year ago

I found it. This is why it's returning a string. There is no Type::Filepath.

https://github.com/fdncred/nushell/blob/main/crates/nu-protocol/src/syntax_shape.rs#L151-L152

I just added Type::Filepath as an experiment.

We may need some explicit filepath literal to move forward.

> let a: path = 'c:\temp'
parsing output_type: String
bytes: Ok("path")
type_bytes: [112, 97, 116, 104], type: Filepath
parsing lvalue: Expression { expr: VarDecl(587), span: Span { start: 106264, end: 106271 }, ty: Filepath, custom_completion: None }, explicit_type: Some(Filepath)
Error: nu::parser::type_mismatch

  ร— Type mismatch.
   โ•ญโ”€[entry #1:1:1]
 1 โ”‚ let a: path = 'c:\temp'
   ยท               โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€
   ยท                   โ•ฐโ”€โ”€ expected filepath, found string
   โ•ฐโ”€โ”€โ”€โ”€

With my changes, it fails the same way as this

> let a: float = 1
parsing output_type: Int
bytes: Ok("float")
type_bytes: [102, 108, 111, 97, 116], type: Float
parsing lvalue: Expression { expr: VarDecl(587), span: Span { start: 106264, end: 106272 }, ty: Float, custom_completion: None }, explicit_type: Some(Float)
Error: nu::parser::type_mismatch

  ร— Type mismatch.
   โ•ญโ”€[entry #2:1:1]
 1 โ”‚ let a: float = 1
   ยท                โ”ฌ
   ยท                โ•ฐโ”€โ”€ expected float, found int
   โ•ฐโ”€โ”€โ”€โ”€
stellarpower commented 1 year ago

We have SyntaxShape::Filepath, FlatShape::Filepath, Expr::Filepath that allows the name path to be used when you create custom commands with that shape. This shape is also used in commands like cp, mv, open, rm, save, etc.

So, are shapes documented? Looking at colours yesterday was the first time I'd heard that term come up, and looking on the site/google I don't see much of an explanation of what they are or how they work. If it's designed to restrict e.g. the dimensions of a list passed in, then I think this would be aligning with what I am thinking. It's good to check type-related requirements statically at compile-time, and then if we can't always verify features we require by using type alone, runtime restrictions on certain conditions at the point of function entry would be good.

As I recall, there were always times when you wanted a string and got a filepath and vice-versa

This has been my experience in other languages too. Be it legacy or just not enough tight integration.

amtoine commented 1 year ago

@stellarpower you've put a :+1: on my question above, is that ok if we close this issue in favor of #10332? :innocent:

CAD97 commented 1 year ago

The big problem with something like path<exists: true, type: file, canRead: true> is TOCTOU-style things. You aren't the only running process -- it's fully possible that some other process changed what's at that path between the time you checked and the time you use the path. To be fair it isn't all that likely within the scope of running a function from interactive shell, and pre-checks for nicer up-front errors for misuse is nice, but if you're doing any of the checking at parse time things will desync.

Needing to do it manually is a bit, but you can error make to emit nice spanned errors pointing at your parameters.

describe

The behavior makes sense if you consider path an alias to string, which is why describe says string and why you can freely intermix variables with types set to path and string.

So, are shapes documented?

What the book calls parameter types are syntax shapes. The book lists some[^1], but there are more[^2], and they are all declarable[^3] as parameter types. The set of actual value types is a subset[^4].

[^1]: any, block, bool, cell-path, cond, duration, error, expr, filesize, glob, int, list, math, number, operator, path, range, record, signature, string, table, variable

[^2]: binary, closure, datetime, nothing ... (couldn't find a list)

[^3]: caveats: block: error; operator, cond, signature, variable: weird; error: you can error make and save that in a let, but trying to do anything else causes the error to occur.

[^4]: any, binary, block, bool, cell-path, closure, date, duration, error, filesize, float, int, list, list-stream, match-pattern, nothing, number, range, record, signature, string, table

We may need some explicit filepath literal to move forward.

This is probably nonviable for use as a shell, because then it's no longer to use bareword strings as paths, e.g. in cd ~. It could work if barewords had something like Rust's numeric literal type selection, though.

sholderbach commented 1 year ago

Thanks for the detailed write-ups @stellarpower and @CAD97!

Good point about the time of check != time of use angle if those invariants were to be checked at a separate time from the intended operations.


The SyntaxShape != Type chism is something we definitely need to improve either in documentation or in how we go about them (I tried to add some internal docs with #10544 )

Type is tightly linked to Value but also includes "supertypes" in the form of Type::Number and Type::Any. This means at runtime things are Value (and PipelineData) and we can check incoming or stored Values against a Type. (@fdncred I think adding a Type without a Value is not possible). The structural subtyping for Records and Tables is implemented on Type.

Information about the SyntaxShape is only known when doing parser-like operations. This is relevant in parsing of the calls of nushell def's (and for some time at least for known externals). Also the completions/highlighting does operate on the assumptions encoded by SyntaxShape. So the primary use for SyntaxShape::Path is to nudge the completer to start out with path completions.

If we were to introduce Type::Path we would need a Value::Path as well so it is recoverable. It would introduce complexities around all the existing commands that operate well on Value::String, should they just work with Path as a subtype or should we make that explicit. Then as @CAD97 mentioned there would be the complexity around the literals. Should they only explicitly convert, convert in the SyntaxShape positions (i.e. if we expect it try to convert), or intractably do a guessing game if something is a path so let x = ~ stores a Path.

In my view it would make sense to look at how @stellarpower 's proposal would play out at the value level first. How should different classes of commands deal with it. Are there required conversions/ambiguities. I think then we can in the second step look at how this would expand the type system and mesh with what we do with the SyntaxShape at the moment.


P.S. @amtoine

@stellarpower you've put a ๐Ÿ‘ on my question above, is that ok if we close this issue in favor of #10332? ๐Ÿ˜‡

10332 is a tracking issue around all confusing type and type-like name and this issue is a specific proposal, that does focus the discussion on a vision for how a Type::Path could look like. Closing this would mess up what #10332 is about in chief.

fdncred commented 1 year ago

We used to Primitive::FilePath (nushell 0.44.0 and before). I'm not interested in recreating the problems we had with it, which was mainly confusion between strings and paths. So, I'm up for trying alternatives out.

stellarpower commented 1 year ago

Okay, I think I am gonna have to do a bit of reading up before I can make any real useful commentary on this one :laughing: