Open stellarpower opened 1 year ago
Thanks for your input!
We have SyntaxShape::Filepath
, FlatShape::Filepath
, Expr::Filepath
that allows the name path
to be used when you create custom commands with that shape. This shape is also used in commands like cp
, mv
, open
, rm
, save
, etc.
โฏ def fp [abc: path] { echo $abc }
โฏ fp foo
/Users/fdncred/src/nushell/foo
I doubt it does all the checking you're talking about, but it's a start.
A long time ago we had a Value::Filepath
but it proved to be difficult to use. As I recall, there were always times when you wanted a string and got a filepath and vice-versa. I believe we left it behind when we went from 0.44 to 0.59.
to me, the most surprising is
> let p: path = $nu.temp-path
> $p | describe
string
@stellarpower
this issue looks quite related to https://github.com/nushell/nushell/issues/10332, right?
there is a section about path
in there
to me, the most surprising is
should this work like a cast?
I ask because $nu.temp-path | describe
also returns string
.
I wonder if part of the issue is that there is no Filepath
literal syntax. ๐ค
e.g. You can't make a path like you can a string.
let a = "string"
let a = `string` # <-- this pukes because it tries to run string
let a = 'string' # that's a string
let a = "string" # that's a string
maybe let
should allow this type of casting?
Maybe this is what Stefan was talking about because there is no Type::Filepath
?
i guess $nu.temp-path
should be a path
directly, but yeah let p: path = ...
could act like a cast :yum:
if you're in /a/b/c
, then
let p: path = "d"
could give /a/b/c/d
:smirk:
since let
is a parser keyword, i guess the changes need to happen there. I'm looking but nothing is jumping out at me to change.
I found it. This is why it's returning a string. There is no Type::Filepath
.
https://github.com/fdncred/nushell/blob/main/crates/nu-protocol/src/syntax_shape.rs#L151-L152
I just added Type::Filepath
as an experiment.
We may need some explicit filepath literal to move forward.
> let a: path = 'c:\temp'
parsing output_type: String
bytes: Ok("path")
type_bytes: [112, 97, 116, 104], type: Filepath
parsing lvalue: Expression { expr: VarDecl(587), span: Span { start: 106264, end: 106271 }, ty: Filepath, custom_completion: None }, explicit_type: Some(Filepath)
Error: nu::parser::type_mismatch
ร Type mismatch.
โญโ[entry #1:1:1]
1 โ let a: path = 'c:\temp'
ยท โโโโโฌโโโโ
ยท โฐโโ expected filepath, found string
โฐโโโโ
With my changes, it fails the same way as this
> let a: float = 1
parsing output_type: Int
bytes: Ok("float")
type_bytes: [102, 108, 111, 97, 116], type: Float
parsing lvalue: Expression { expr: VarDecl(587), span: Span { start: 106264, end: 106272 }, ty: Float, custom_completion: None }, explicit_type: Some(Float)
Error: nu::parser::type_mismatch
ร Type mismatch.
โญโ[entry #2:1:1]
1 โ let a: float = 1
ยท โฌ
ยท โฐโโ expected float, found int
โฐโโโโ
We have
SyntaxShape::Filepath
,FlatShape::Filepath
,Expr::Filepath
that allows the namepath
to be used when you create custom commands with that shape. This shape is also used in commands likecp
,mv
,open
,rm
,save
, etc.
So, are shapes documented? Looking at colours yesterday was the first time I'd heard that term come up, and looking on the site/google I don't see much of an explanation of what they are or how they work. If it's designed to restrict e.g. the dimensions of a list passed in, then I think this would be aligning with what I am thinking. It's good to check type-related requirements statically at compile-time, and then if we can't always verify features we require by using type alone, runtime restrictions on certain conditions at the point of function entry would be good.
As I recall, there were always times when you wanted a string and got a filepath and vice-versa
This has been my experience in other languages too. Be it legacy or just not enough tight integration.
@stellarpower you've put a :+1: on my question above, is that ok if we close this issue in favor of #10332? :innocent:
The big problem with something like path<exists: true, type: file, canRead: true>
is TOCTOU-style things. You aren't the only running process -- it's fully possible that some other process changed what's at that path between the time you checked and the time you use the path. To be fair it isn't all that likely within the scope of running a function from interactive shell, and pre-checks for nicer up-front errors for misuse is nice, but if you're doing any of the checking at parse
time things will desync.
Needing to do it manually is a bit, but you can error make
to emit nice spanned errors pointing at your parameters.
describe
The behavior makes sense if you consider path
an alias to string
, which is why describe
says string
and why you can freely intermix variables with types set to path
and string
.
So, are shapes documented?
What the book calls parameter types are syntax shapes. The book lists some[^1], but there are more[^2], and they are all declarable[^3] as parameter types. The set of actual value types is a subset[^4].
[^1]: any
, block
, bool
, cell-path
, cond
, duration
, error
, expr
, filesize
, glob
, int
, list
, math
, number
, operator
, path
, range
, record
, signature
, string
, table
, variable
[^2]: binary
, closure
, datetime
, nothing
... (couldn't find a list)
[^3]: caveats: block
: error; operator
, cond
, signature
, variable
: weird; error
: you can error make
and save that in a let
, but trying to do anything else causes the error to occur.
[^4]: any
, binary
, block
, bool
, cell-path
, closure
, date
, duration
, error
, filesize
, float
, int
, list
, list-stream
, match-pattern
, nothing
, number
, range
, record
, signature
, string
, table
We may need some explicit filepath literal to move forward.
This is probably nonviable for use as a shell, because then it's no longer to use bareword strings as paths, e.g. in cd ~
. It could work if barewords had something like Rust's numeric literal type selection, though.
Thanks for the detailed write-ups @stellarpower and @CAD97!
Good point about the time of check != time of use angle if those invariants were to be checked at a separate time from the intended operations.
The SyntaxShape
!= Type
chism is something we definitely need to improve either in documentation or in how we go about them (I tried to add some internal docs with #10544 )
Type
is tightly linked to Value
but also includes "supertypes" in the form of Type::Number
and Type::Any
. This means at runtime things are Value
(and PipelineData
) and we can check incoming or stored Value
s against a Type
. (@fdncred I think adding a Type
without a Value
is not possible). The structural subtyping for Record
s and Table
s is implemented on Type
.
Information about the SyntaxShape
is only known when doing parser-like operations. This is relevant in parsing of the calls of nushell def
's (and for some time at least for known externals). Also the completions/highlighting does operate on the assumptions encoded by SyntaxShape
. So the primary use for SyntaxShape::Path
is to nudge the completer to start out with path completions.
If we were to introduce Type::Path
we would need a Value::Path
as well so it is recoverable. It would introduce complexities around all the existing commands that operate well on Value::String
, should they just work with Path
as a subtype or should we make that explicit. Then as @CAD97 mentioned there would be the complexity around the literals. Should they only explicitly convert, convert in the SyntaxShape
positions (i.e. if we expect it try to convert), or intractably do a guessing game if something is a path so let x = ~
stores a Path
.
In my view it would make sense to look at how @stellarpower 's proposal would play out at the value level first. How should different classes of commands deal with it. Are there required conversions/ambiguities. I think then we can in the second step look at how this would expand the type system and mesh with what we do with the SyntaxShape
at the moment.
P.S. @amtoine
@stellarpower you've put a ๐ on my question above, is that ok if we close this issue in favor of #10332? ๐
Type::Path
could look like. Closing this would mess up what #10332 is about in chief.We used to Primitive::FilePath
(nushell 0.44.0 and before). I'm not interested in recreating the problems we had with it, which was mainly confusion between strings and paths. So, I'm up for trying alternatives out.
Okay, I think I am gonna have to do a bit of reading up before I can make any real useful commentary on this one :laughing:
Related problem
Most languages work with paths as strings, and whilst some offer higher-level support, often these languages fall back to simply performing string concatenation or providing no guarantee on proper path support, leaving that up to the user-programmer to manage themselves.
This can cause various problems. The literal path to an entry on the filesystem (
/foo/bar/baz
) is quite different from what we care about programmatically (I expect baz to be a directory, and not a symlink, where I have write permissions, because I am about to write some files into there). The latter ends up being specified implicitly, or by convention, and that's dangerous.Coming from other shell scripting languages, I have two options when I am writing my functions:
if ! test -d $outputDirectory ; bail 4 "Needed an output directory" ; end
, checking every argument in turn and explicitly blowing up with a colourful error message.Describe the solution you'd like
Nu already gives really nice error messages for type errors, and these bubble through because it's compiled before execution. Builtins like
save
are safe by default, because they don't overwrite unless you force them to - the exact opposite of most other shells. So this is a great starting point.But I think a dedicated path datatype could help integrate these into the language further, given that working with files is such an integral part of operating in any shell. It could help portability by enforcing and making transparent proper handling of differences across platforms. It helps specify what elements in an external environment a script needs in order to be able to run, before we get to the point that we want to use them.
The main thing that comes to mind however, is it would be really cool if I could use path type annotations for my functions, in a similar way it seems I can for specifying valid records based on their fields. Rather than having something that is string-like, I could specify precisely what I need and expect to be on the filesystem, before I start trying to open it or use it. Thus eliminating the problem mentioned above of performing manual checks. This would ensure my scripts are safe, self-documenting, I get back a clear error message outlining the difference between what I asked and what I should have asked, and most importantly, fail as early as possible - critical for speeding up development and making the whole process easier. As always, sanitise user input at the first point it enters the system.
Let's say I want to have a file (needle), and I want to find duplicates of that in a directory (haystack), and output the location of those duplicates to a third file (results). I am picturing something like this:
Now any misuse of this function (whoops! I got
needle
andresults
the wrong way round) would produce a really clear error message - before anything inside the function had even run yet. It would make it clear what the problem is (I gave bad inputs) rather than distracting me by concentrating on the error that was thrown at the use site (this element halfway through a pipeline was unable to write a string into a directory, or open a file owned by root).Describe alternatives you've considered
Not thought of any so far. I know the name
path
would already clash with the command.Additional context and details
Also, I believe there are some issues with sourcing files according to a glob pattern, because these aren't CT constants. It might be possible that an integrated path datatype could get round some of these issues, by making some of these activities into things we can know (or find out) at compile-time, in a way that operating with strings etc. we couldn't in a way that would be reproducible.
Any thoughts?
Thanks