Open byorgey opened 2 years ago
Thanks for writing this @byorgey, this sounds great and could be a huge improvement of S-Critical levels! :smile:
For example, my earlier error with not included sources (#411) would be caught at type checking. :+1:
OK, in that case I've marked it S-Critical
!
Perhaps import
could add the definitions to the environment using qualified names, for example, given a file my.sw
:
// my.sw
def m2 = move; move end
def uturn = turn right; turn right end
Then running import "my.sw"
would add definitions named my.m2
and my.uturn
We might also want to consider "dhall style" import. If I understand correctly how this would work, then any Term could be an import statement (e.g. a file or URL), and the import resolution system would substitute such terms with the content of their import locations. So for example, given this file:
// m4.sw
move; move; move; move
Then the following expressions are valid:
build {./m4.sw}
def m4 = ./m4.sw; end
And we could use the dot syntax for import that contains a list of definitions. For example, instead of using import "my.sw"
, we could define a new namespace:
>>> def my = ./my.sw end
>>> build {my.m2; ...}
Well if we can make this work, then it would be a nice solution for code packaging and distribution. Though I'm not really sure it's worth the effort, but I figured it may be useful to mention it here.
@TristanCacqueray thanks for the idea! I think Swarm is actually well suited for this style since everything in the language is an expression. Even a file full of def
s is really just one big expression that happens to be a chain of def
commands joined with semicolons. It would be super cool to be able to e.g. directly import files from URLs --- e.g. if someone had some definitions on the wiki you could just import them directly, without having to download the file first.
In fact, now that I think about it, this is pretty much already how I was suggesting import
should work --- the difference is really just one of syntax. We'll probably want to bikeshed the syntax (and make sure it's something we can parse unambiguously), but in general I'm very open to using a more lightweight syntax for imports, and supporting URLs as well.
The issue of namespacing is a separate issue, I think. Having names from imports automatically qualified seems too heavyweight --- I definitely don't want to have to type my.whatever
for every single name. If it becomes important I think we can come up with something relatively simple to manage qualified names.
An important caveat / thing that has to be properly addressed and designed for is that there needs to be an explicit difference between "re-fetch the most current version of this thing from the filesystem or network since it's been updated and I want the latest version" and "refer to a cached version of this thing which is named by an external URI but shouldn't be re-fetched every time it is used". This is more than just a difference in semantics; the first must have some sort of cmd
type whereas the second does not have to. Perhaps for the first we just want some kind of load : string -> cmd ()
which fetches and caches, but does not run/execute, the named resource (and any that it recursively refers to). For convenience we could then define run t
to do a load
followed by executing the loaded thing.
Is load
necessary? IMO we should be able to resolve this when type-checking import
.
For example, imagine running import "HTTP://some-swarm-file.sw"
several times. We can store it in the cache and then only check if it (and transitive dependencies) changed. (Bonus points for checking ETag or Term
difference.)
I think what we need is to:
import "http://source"
to Import (WebSource "...")
WebSource
to CachedSource
(here we have to add hash/ETag to distinguish version)
data Source = LocalSource | CachedSource
.Source
to memory, type-check it and give it an internal nameAlso Location
may need to know about files.
EDIT: Actually we need to have the source loaded into memory to construct a well-typed Term
, so caching, type-checking and creating an internal name must happen immediately.
What you said about having data Source
, etc. all sounds good. However, I don't think having a single import
is satisfactory. Let me see if I can clearly explain my reasoning.
http://blah/thing.sw
it could have definitions... but it could also be e.g. the number 4
, or a pair of two lambdas, or whatever.def x : int = 3 + [http://blah.com/the_number.sw]
. This has a couple implications:
cmd
type, so for the sake of referential transparency we want to just cache it the first time it is referenced and never reload it. For example we want let x = [http://foo] in x + x
to always be the same as [http://foo] + [http://foo]
.load
any more, but maybe instead we should keep run : string -> cmd ()
, where the thing referenced by the URI must have type cmd ()
, and will be explicitly reloaded and executed.
This is actually much more theoretically complex and interesting than I thought. Basically because Swarm programs are going to be long-running, we need to support some kind of "hot reloading". Dhall does not have to deal with this because Dhall programs do not run for a long time; as I understand it has an "import resolution" phase at the beginning where it essentially replaces all import expressions with the current contents of the referenced file, and only then proceeds to evaluate everything using that snapshot of all the imported file contents. If you edit the files while Dhall is evaluating things it will have no effect; you would have to re-run the Dhall program from the beginning to see the changes.
I am very confused. Your first branch of bullet points leads to caching files, but that is exactly what I was pondering previously. But in the second branch you argue for keeping run as it is now in some form.
I do not understand why it would be a good idea to hot reload instead of doing the loading once at the beginning like Dhall. Its approach sounds very reasonable to me and exactly like what this Issue originally proposed.
In particular I do not see what prevents us from resolving the source at the time it is type-checked — even if it is number 42 or “Hello world!”
If I want to import over network I can probably live with a less then a second delay of downloading a file in the background. The technical difficulty would be to put type checking in another thread and make sure it returns before we let the command be executed.
But that to me sounds much preferable to:
I am very confused. Your first branch of bullet points leads to caching files, but that is exactly what I was pondering previously. But in the second branch you argue for keeping run as it is now in some form.
Sorry for the confusion --- I was trying to argue why we need both.
I do not understand why it would be a good idea to hot reload instead of doing the loading once at the beginning like Dhall. Its approach sounds very reasonable to me and exactly like what this Issue originally proposed.
In particular I do not see what prevents us from resolving the source at the time it is type-checked — even if it is number 42 or “Hello world!”
Hmm... maybe you are right. :thinking: The scenario I had in mind is when you have a file of definitions, mydefs.sw
, so at the REPL prompt you type
> import "mydefs.sw"
(or whatever the syntax is), this causes the definitions to be loaded from the file, typechecked etc. But later you edit the file and then want to reload the new, changed definitions. So you type import "mydefs.sw"
at the REPL prompt again. Do you get the newly edited file being loaded again? Or do you just get a cached version? I thought this was a tricky question, but now I think I see that it is not: because we are entering a brand new Swarm program (i.e. our REPL input), we should go (re)load the file while we are typechecking the input. On the other hand, suppose the commands in that file cause a robot to be built and go off running a program. That robot will forever use the program as it was defined at the time the robot was created, even if we later edit the file.
In summary, I was confused about the difference between robot programs running for a long time, and a game running for a long time during which you might enter many REPL inputs. As you say, I think the key idea is to resolve imports at typechecking time, just like Dhall does. That naturally implies that a long-running robot program will never see updates to remote files (because it is typechecked once and the runs for a long time), but entering stuff at the REPL is a natural opportunity to get the most recent version of something every time you reference it (because every time you enter something at the REPL, it is a new Swarm expression which has to be type-checked etc.).
So in the end, I agree we are back to what was originally proposed in this issue. But now I understand it much better. :smile:
@byorgey OK, I think we understood each other, hopefully I did not come across as too dismissive. 🙂
The thing leads me to argue for long running robots to not get extra power from run
is that I have other solutions in mind. For example the web API to supplement REPL, antenna to communicate with robots and keyboard control to drive around. 😉
PS: Also immediately type checked import does not give extra power, so we do not need to come up with a capability and device for it! 😄
@byorgey OK, I think we understood each other, hopefully I did not come across as too dismissive. :slightly_smiling_face:
No, not at all! I'm really grateful to have someone who won't let me get away with nonsense. :smile:
The thing leads me to argue for long running robots to not get extra power from
run
is that I have other solutions in mind. For example the web API to supplement REPL, antenna to communicate with robots and keyboard control to drive around. wink
Yes, I completely, 100% agree. Like I said, when I was talking about "hot swapping" what I actually had in mind was the ability to reload a previously loaded module by typing it at the REPL again (which is not really hot swapping at all), not hot swapping new code into a running robot.
PS: Also immediately type checked import does not give extra power, so we do not need to come up with a capability and device for it! :smile:
Agreed, yay!
@byorgey can we improve this in a minimal way and get 90% of the benefits?
I am thinking about initially restricting import
to be nonrecursive or even one-level.
With that restriction, we could add a stage before type-checking where in processParsedTerm'
for each import fp.sw
we:
__fp
(keeping the location)T
__fp: T
This would involve adding IO
to the type of processParsedTerm'
, but I think that is necessary for the eventual full solution as well. I hope this could serve as the first stepping stone to iteratively improve import
instead of doing it all at once as a big project.
What do you think @byorgey, is this a reasonable proposal or a dead end? 🙂
I think it is a reasonable proposal but I do not think it is actually much easier than the full thing. It is like getting 90% of the benefits with 90% of the work. :wink: I am actually thinking of working on this next, and I don't think it will be hard.
@byorgey great, I trust your judgement and I am happy to leave it to you. 👍
In my defence you tagged this as C-Project so I thought there would be a lot more work involved with caching the files and resolving collisions and whatnot. 😅
I mean, there are probably lots of details to get right so it will be a decent amount of work, hence deserving C-Project
. But I don't think it is conceptually difficult.
:thinking: maybe we need different sets of tags to distinguish between how difficult something will be and how much work it will be...
OK, I take it back, this is super annoying. :angry: Basically the problem is that processTerm
and friends now require IO
, as @xsebek points out above. But in several places we use processTerm
as part of some FromJSON
instances (in particular for Robot
and Scenario
), which definitely cannot involve IO
. So we have to somehow stage it so we can first read JSON while leaving any embedded terms as Text
, and then have a later resolution stage where we process them (loading any imports from disk/network).
It would be easy enough to just put Either Text ProcessedTerm
s in, have the FromJSON
instances always return Left
, and then have a function which goes through and changes everything to Right
. But it would be easy to forget or miss some, etc. so I'd really like to do it in a typesafe way, so that the type of e.g. Scenario
tells you whether it has raw Text
terms or ProcessedTerm
values. We already do something kind of similar for Robot
, but so far it's been tedious and annoying getting everything to work. I think I can do it though.
UPDATE: It's even worse than I thought, the tendrils reach all the way into reading the palette in a world description, because it has a FromJSONE
instance that requires a RobotMap
, which is built from TRobot
values, which contain ProcessedTerm
s...
Loading and resolving imports at parse time is simply not going to work. I think the next thing I'm going to try is to parameterize Syntax
not on the type of annotation, but on a phase, then use type functions to decide on various types depending on the phase. That way we can have a parsing phase which leaves imports unprocessed, then an import resolution phase, then typechecking, etc. and keep everything relatively type safe (e.g. we won't be able to accidentally start typechecking a term that hasn't had all its imports processed).
@byorgey robots also need this change, because they are expected to have CESK
already after parsing. I don't think that is an option as we don't have IO there.
So I think they need an extra Parsing
phase before Template
, so that the parsing is done once.
@xsebek agreed! I think it is going to be a far-reaching change but I'm hoping parameterizing everything by phase will make it all work.
Elsewhere (e.g. #7, #208, #328) I have talked about how I don't like the
run
command and want to get rid of it. However, I'd like to explore an alternative: maybe I don't like it because it's not typechecked properly!In particular, right now
run : string -> cmd ()
, and typechecking it is utterly trivial: just make sure it is given a string as an argument. However, this leads to a few problems:x.sw
definesgo : cmd ()
, thenrun("x.sw"); go
would work perfectly fine at runtime, but currently does not typecheck, with an error aboutgo
being unbound.true : ()
.run
works now can break type safety entirely: #2068.So here's an alternative we might contemplate:
run
toimport
and make it special built-in syntax (requiring an argument which is a string constant) rather than a command constant.import "foo"
, readfoo.sw
from disk and typecheck it (unless it has already been imported before, in which can we can just look it up in a map). The type ofimport "foo"
is simply the type of the Swarm expression infoo.sw
, and also binds the names of any definitions it contains. In other words it should behave exactly as if the contents offoo.sw
had been typed in place of theimport
(modulo the fact that files can be imported more than once but only need to be checked once).IO
to the type inference monad. Or actually, maybe not --- perhaps we could add a phase in between parsing and type checking where we gather up all theimport
statements and load and parse their referents into aMap
, recursively processing any imports encountered; then we pass that map to the typechecker too. In any case it seems like it requires us to addIO
to theprocessTerm
function in one way or another.This would still not be a true module system (in particular there's no way to do qualified imports, or hide names, etc.) but it would be way better than what we have, and it would at least solve #208 and #328 .