luvit / lit

Toolkit for developing, sharing, and running luvit/lua programs and libraries.
http://lit.luvit.io/
Apache License 2.0
245 stars 58 forks source link

Implement installing packages over Git #309

Open Bilal2453 opened 2 years ago

Bilal2453 commented 2 years ago

The Pull Request

This is an initial PR of many planned PRs aiming at polishing the Lit experience. In this PR I wrote the basics of getting a lit install https://github.com/author/repo command working, as well as refactoring some of the code, and documenting it. I will go through every little change that I have made, and the expected behavior of such changes, as well as potential bugs that I spotted. But first I will explain how to use this implementation.


How to use it

I wanted to keep this implementation as simple as it could be, to not confuse the end-user and keep it familiar to them. Therefor my choice was to use Git's URL schemes, similar to how git clone URL handle it (and similar to values lit make currently accept). Some examples of the command:

All of the above commands are expected to work. It will fetch the repo tree, calculate the dependencies, install the package correctly and then install the package's dependencies; Even if such dependencies are Git repos too. For example, a possible package.lua:

return {
    name = "author/package",
    version = "0.0.1",
    -- etc
    dependencies = { "git://github.com/Bilal2453/discordia-replies", "https://gitlab.com/Bilal2453/lit-git-test.git" },
}

It is as well expected to work with any Git upstream, in the above example GitLab and GitHub are used.


How it works

While in the process of making this implementation small and simple, I decided to reuse most of the Lit logic regarding Git DB, this implementation will use the Git CLI to fetch the given repo into Lit's DB, then it queries the package in-db (just like Lit already does), then marks the package for installation. The installation process itself is untouched, the behavior from this point is identical to what Lit already does.

Single-file packages though, those have been very problematic to implement since you basically cannot tell a service (GitHub for example) to create tag pointing to the desired blob (how Lit currently does it), making us unable to determine which blob in the tree is the package file. I've been in front of two choices, either I don't implement it, or I search the tree for a Lua file, then if and only if the tree only has a single Lua file, use that as the package. I've decided that "something is better than nothing" and went ahead with latter method, I will have to document this behavior really well, to make sure the end-user will expect this.


Not yet implemented

TL;DR:

  1. calculate-deps.lua. This indicates that lit install author/ is allowed. And indeed when I tried it with lit install SinisterRectus/ it tried to do... something, not quite sure what it is trying to do (probably install first-to-match package) but it quickly fails with a somewhat unexpected error. Will be opening an issue about it.

  2. calculate-deps.lua. As I understand, this will try to process the dependencies of the package... what happens if two packages lists each other as a dependency? I haven't confirmed this but from what I am reading... it would be stuck in a recursive loop. No checks to prevent this from happening are provided.

  3. core.lua. The field name has a typo, and this field is never used anywhere. Correcting it should break nothing...?

  4. core.lua. I wanted to match Git's accepted scheme for fetch, and found this in core... although I could just take the pattern part and leave the handler one, the name of this field does not imply it is related to lit make nor I think it makes sense to expose it as such.


Changes overview

Here I will list most behavior changes, additions, or deletions I've made to the existence code.

  1. rdb.lua:
    • Added db.offlineLoadAny. Because rdb patches db.load, it makes it impossible to use db.loadAny locally without triggering the upstream matching. This solves it by re-implementing loadAny but with the offloadLoad reference.
  2. pkg.lua:
    • Added pkg.queryGit. It is pretty similar to queryDb except in that it is offline only, and handles single-file packages differently (as mentioned previously).
    • Edited the doc comment to include pkg.queryGit, and to add hash as a return of queryDb.
  3. calculate-deps.lua (PR's main focus):
    • Refactoring the code to stop using nesting functions inside other functions.
    • addDep & processDeps logic is merged together into resolveDeps.
    • Everything is now documented, with hopefully clear comments.
    • Old Lit behavior is untouched, except when package name starts with a Git scheme.
    • Implemented resolveGitDeps that handles git calculations (and fetching).

Those should be all of the changes introduced in this PR, hope I missed nothing. There should be 0 change over how Lit currently work. If you observe any, it is a bug.


Tests

To test this implementation I've made a really small repo at https://github.com/Bilal2453/lit-git-test/ which should be enough to show any odd behavior, it tests old dependencies resolution behavior, single-file package over Git, using multiple Git upstreams (GitHub and GitLab in this test), and rule excluding. Overall, it makes use of 3 upstreams, GitHub, GitLab and Lit.


Finally, feel free to give me your feedback, review the code and test it, and contribute to the implementation. I hope this is up to the task of making Lit even better.

SinisterRectus commented 2 years ago

Hi, thank you for this PR and for your patience. I am going to go through it one step at a time so that I understand it.

SinisterRectus commented 2 years ago

Overall I think it looks okay. But this is honestly outside of my understanding. I would legitimately have to rewrite all of lit to figure out how lit and git work.

The single file package problem is unfortunate.

Can you open issues or PRs for the potential bugs that you found? I think they are worth addressing.

Bilal2453 commented 2 years ago

The single file package problem is unfortunate.

It indeed is, the implementation does its best to solve it but, that is as far as it is possible with Git.

But this is honestly outside of my understanding.

So it is for almost anyone frankly, no one fully understands how Lit (except @creationix perhaps) works, so goes to Git, nor I expect anyone to spend two days reading Lit's source code and Git docs – like I have. Therefore I was thinking of reviewing this PR using its behaviour, heavily testing it to ensure it is behaving, and deciding upon that how the PR should be proceeded.

Although that of course won't tell us anything beyond "Is it working or not", which is sad.

Can you open issues or PRs for the potential bugs that you found?

Surely, I will be glad to fix those and continue working on this PR. Hopefully that will be on the next Sunday.

SinisterRectus commented 2 years ago

Maybe you can write a test file for this? We have limited testing on lit. Always room for more.

creationix commented 2 years ago

Nice work! I hope to be able to review this later.

creationix commented 2 years ago

A way of specifying a commit, or a branch, or even a tag is mandatory. I although have been very conflicted over this and over how I will be implementing it.

The convention used by npm is url#sha1 or url#tag. We can use the same here.

On single-file packages, the git data model supports this only via annotated tags. When you publish to lit, the object is literally a git tag object. Unlike commit objects that can only point to a tree, tags can point to anything (commits, trees, or blobs) You may be able to fetch the tag via the git CLI tool and then read the tag and find the blob using the local git files after fetching.

creationix commented 2 years ago

Internally implement Git protocol over HTTPS (smart HTTPS)

I've done this before in JavaScript, it's not too terrible. The hardest part is reading the packfile which I seem to remember we already have in the git implementation in lua.

Handle Git submodules (?).

This would be great, what use cases do you have in mind? Is the goal to treat submodules as symlinks basically so their contents are in-place? This could be a way to manage recursive dependencies using only git since luvit's require resolution algorithm looks for libs and deps locally first.

Bilal2453 commented 2 years ago

The convention used by npm is url#sha1 or url#tag. We can use the same here.

I will be using something similar here as well then.

On single-file packages, the git data model supports this only via annotated tags. When you publish to lit, the object is literally a git tag object. Unlike commit objects that can only point to a tree, tags can point to anything (commits, trees, or blobs) You may be able to fetch the tag via the git CLI tool and then read the tag and find the blob using the local git files after fetching.

Right, I knew that much. The problem here is the Git repo owner has to create a tag for each release manually and point it to the right blob, sounded to me quite inconvenient, will see what I can do about this later on.

I've done this before in JavaScript, it's not too terrible. The hardest part is reading the packfile which I seem to remember we already have in the git implementation in lua.

Nice. I think this should be my next task here, maybe before that make some tests to ensure default behavior isn't changed. I think it should be fairly straightforward.

This could be a way to manage recursive dependencies using only git since luvit's require resolution algorithm looks for libs and deps locally first.

Yep, that is what I had in mind. A way to basically do recursive dependencies. I can see that being quite useful if the said repo wasn't published to Lit or something along those lines, then you can just have it as a Git submodule. But at the same time I also have some serious concerns.

  1. It will definitely complicate everything. Having to handle how submodules themselves work, make sure we don't run into recursive loops but yet handle legitimate recursive deps, makes it much harder to track what is currently installed, etc.

  2. Dependency hell . This may very much create a dependency hell where some submodule depend on a ton of others that in turn depend on more. Some of which may already be locally installed, but we will have to install it yet again to satisfy the package expectations.

  3. Version conflicts. I can imagine a scenario where some package wants a specific commit of a Git submodule, when there is a locally installed version of that package with a different version. Or two packages that are a git submodule dependency each want their own version of some package.

I will be holding off on this for now. But even in the future, I am not sure how should this look like. Seems like a quite complicated feature that may very well end up un-used/over-used.

cc: @creationix