gracelang / minigrace

Self-hosting compiler for the Grace programming language
39 stars 22 forks source link

Import first .js/.grace file found in GRACE_MODULE_PATH #296

Closed IsaacOscar closed 5 years ago

IsaacOscar commented 5 years ago

This makes it so that when importing, the first .js or .grace file found in GRACE_MODULE_PATH will be used. Previousesly, Previously, it would only load a .grace file if no .js file was found in GRACE_MODULE_PATH .

E.g., if GRACE_MODULE_PATH is ./foo:./bar, whith the following directory structure:

├── foo
│   ├── mod.grace
├── bar
│   ├── mod.js

An import "mod" statement will now import foo/mod.grace, previousesly it would import bar/mod.js.

This fixes issue #293

apblack commented 5 years ago

I think that this might be a reasonable change in semantics. I changed make install fairly recently (3001f5b) so that the source files are installed along with the .js files. But they are not installed in the j1 and j2 directories used during the build.

There are two problems one needs to avoid. One is using an out-of-date .js file, when the source file has changed. The other is repeatedly recompiling a source file when there is already a compiled .js file further down the path. I have seen the latter happen repeatedly with gUnit when running tests — in fact, this is why I put the source files in /usr/local/lib/grace/modules.

The problem of failing to recompile the source comes up when working on minigrace. The obvious solutions to put a reference to the source file in the .js file, and comparing the dates. But this does not work when the .js files are moved to another computer during an install. (The latter doesn't happen right now, but I plan to change the npm package so that it can be used by the end user.) It also doesn't work when the .js file, or the directory containing it, is not writable by the current user. We may need separate solutions for active development and installed package.

Whatever changes you propose, it is essential that there are tests that document the desired behaviour. So please add tests to this PR.

IsaacOscar commented 5 years ago

This was a very confusing piece of code. Please try to leave it clearer than when you started.

Sorry about that. I was just trying to minimise the ammount of code changes, and to keep everything looking like the surrounding code. That was dumb of me! I should make the the code "better" instead.

IsaacOscar commented 5 years ago

I think that this might be a reasonable change in semantics. I changed make install fairly recently (3001f5b) so that the source files are installed along with the .js files. But they are not installed in the j1 and j2 directories used during the build.

There are two problems one needs to avoid. One is using an out-of-date .js file, when the source file has changed. The other is repeatedly recompiling a source file when there is already a compiled .js file further the grace file for bar/mod.js would be baz/mod.grace:down the path. I have seen the latter happen repeatedly with gUnit when running tests — in fact, this is why I put the source files in /usr/local/lib/grace/modules.

The problem of failing to recompile the source comes up when working on minigrace. The obvious solutions to put a reference to the source file in the .js file, and comparing the dates. But this does not work when the .js files are moved to another computer during an install. (The latter doesn't happen right now, but I plan to change the npm package so that it can be used by the end user.) It also doesn't work when the .js file, or the directory containing it. is not writable by the current user. We may need separate solutions for active development and installed package.

Whatever changes you propose, it is essential that there are tests that document the desired behaviour. So please add tests to this PR.

I was expecting matching .js and .grace files in different paths, to be actually different modules. In which case I would expect a .grace file to be before the .js one for the other module, e.g:

├── foo
│   ├── mod.grace
├── bar
│   ├── mod.js

But I can see the use case were the corresponding .grace file for a .js file is not in the same directory. In which case the structure is likely to be the oposite:

├── foo
│   ├── mod.js
├── bar
│   ├── mod.grace

Here foo/mod.js would be the compiled version of bar/mod.grace. We need to know what the .grace file is, so we can know when it needs to be recompiled.

I see three possible solutions:

  1. Store a relative path to the .grace file in the corresponding .js file.
  2. Put a relative symlink to the .grace file in the same directory as the .js file, e.g:
    ├── foo
    │   ├── mod.js
    │   ├── mod.grace -> ../bar/mod.grace
    ├── bar
    │   ├── mod.grace
  3. Say that a .js's corresponding .grace file is the first matching file after the .js one. (e.g. if your GRACE_MODULE_PATH was foo:bar:baz, and your directory looked like this:
    ├── foo
    │   ├── mode.grace
    ├── bar
    │   ├── mod.js
    ├── baz
    │   ├── mod.grace

    the grace file for bar/mod.js would be baz/mod.grace:

What do you think?

IsaacOscar commented 5 years ago

Ok, I've improved the code. Note that it still dosn't work with builtin modules, e.g. if you have a "unicode.grace" in your GRACE_MODULE_PATH, an import "unicode" will still import the builtin one.

apblack commented 5 years ago

I like your solution (2): make it an invariant that a foo.js file is always compiled from the foo.grace file in the same directory — if necessary by linking the source into that directory. That will solve the problem of checking whether the .js file is up-to-date. It's almost the same as solution (1), but makes it easier to change or delete the .grace file (without editing the generated .js).

We are still left with the problem of compiling a file multiple times, though. If GRACE_MODULE_PATH=a:b, and there is a foo.grace in a, but the compiler puts the generated foo.js (and a link to ../a/foo.grace) into b, then the loader will find b/foo.js and the program will run OK.
However, the next time that module is compiled, the compiler will again find a/foo.grace, notice that there is no a/foo.js, and compile it again.

Currently, the compiler will ignore a/foo.grace and use b/foo.js, which is the behaviour that you note as unexpected in #293. I suppose that we could find b/foo.js, look for the corresponding b/foo.grace, and see if that is a sym-link to the .grace file that we first found on the path. That seems a bit baroque.

We could also remove the --dir flag from the compiler, but that was put there to solve other problems — notably the need to compile the same sources multiple times into different directories when bootstrapping the compiler. It is also a common convention to put build-products in a different directory from sources.

Figuring out the "right" behaviour here is not easy!

I'm not worried about builtin modules, because these should just go away: every module should have source (even if that source is nothing more than comments, types and native "js" code statements). This would get rid of "stubs" and the hand-editing of gct strings.

IsaacOscar commented 5 years ago

We are still left with the problem of compiling a file multiple times, though. If GRACE_MODULE_PATH=a:b, and there is a foo.grace in a, but the compiler puts the generated foo.js (and a link to ../a/foo.grace) into b, then the loader will find b/foo.js and the program will run OK. However, the next time that module is compiled, the compiler will again find a/foo.grace, notice that there is no a/foo.js, and compile it again.

Is there a specific reason why you wouldn't want to swap the order of GRACE_MODULE_PATH?

Alternatively, you could add a symlink a/foo.js -> ../b/foo.js, and modify minigrace to ignore any .js files that are broken symlinks (or otherwise can't be read).

Another option might be to have the compiler take a specific output-directory on the command line:

  1. Let d1 be the first directory in GRACE_MODULE_PATH where d1/foo.grace exists
  2. Let d2 be the first directory in GRACE_MODULE_PATH where d2/foo.js exists
  3. If d1/foo.grace is newer than d2/foo.js, recompile d1/foo.grace and save it in the output-directory
  4. Otherwise just use d2/foo.js

The output-directory should probably default to the current working directory.

It is also a common convention to put build-products in a different directory from sources.

Yes, I currently have some grace code, with a script like this:

mkdir -p out
cd out
rm -f ./*.grace
ln -sf ../src/*.grace ./
mgc Main.grace

However I also have different versions of grace files, so I will run something like

ln -sf v1/*.grace ./out/

to activate "v1" of the grace files.

This way only my out folder will be touched when building, so I can ignore it. It also keeps the files in my source code directory to a minimum.

apblack commented 5 years ago

There is a flag to put the output files in a separate directory — the --dir flag. That's what causes the problem. The default is indeed the current directory. While the current directory is always on the module search path, the --dir directory is not — and for good reason.

The compiler puts the --dir directory on the search path implicitly, but it goes at the end — I think it would be wrong to put it at the beginning.

Your script to put the build products in an out directory can be accomplished more handily by giving the compiler the --dir out flag; out will be created if it doesn't exist.

I'm modifying the install and packaging scripts to always include the source files, which will avoid some of these problems. But a more general problem is: how do we know when two modules are actually the same. For example, module "main" may import "foo" and "sub/bar", and "sub/bar" may import "../foo". We must import "foo" just once. How do we know when two distinct paths reference the same file?

Absolute paths don't work because the same file may be linked at two different paths. I've thought of using inode numbers, but I don't know if they exist on Windoze. We could compute the SHA5 hash of the contents of the module, and say that two modules are the same if they have the same hash.

IsaacOscar commented 5 years ago

In windows, once you've got a file handle, you can use GetFinalPathNameByHandle to get a cannoncial path, in particular it works with symlinks (kindof like realpath in linux).

apblack commented 5 years ago

Well, maybe we should put the —dir directory at the start of the search path. The user is probably in control of what goes there.

I’ve been playing around with the npm package, trying to make it stand-alone. I’ve been seeing constant re-compilation. I can’t see a way to have the npm package install set GRACE_MODULE_PATH. It may be possible to move the installed modules to {prefix}/lib/grace/modules.

apblack commented 4 years ago

@IsaacOscar: Unfortunately, when moved into the known-good compiler, this change causes minigrace to compile some compiler modules many times. When running the self-test, it seems to cause infinite recompilation, i.e. the self-test never completes. I'm going to try to revert this change, and implement a new proposal (discussed in issue #293).