lcompilers / libasr

The ASR (Abstract Semantic Representation), backends, optimizations and other tooling
MIT License
12 stars 2 forks source link

ASR: Add synced Libasr #2

Closed Shaikh-Ubaid closed 2 years ago

Shaikh-Ubaid commented 2 years ago

This PR syncs the current Libasr of LFortran and LPython.

Corresponding MR at LFortran: https://gitlab.com/lfortran/lfortran/-/merge_requests/1816

Corresponding PR at LPython: https://github.com/lcompilers/lpython/pull/730

Notes:

Shaikh-Ubaid commented 2 years ago

Currently all the three repositories use the same libasr.

Corresponding comment at LFortran: https://gitlab.com/lfortran/lfortran/-/merge_requests/1816#note_1015228153

Corresponding comment at LPython: https://github.com/lcompilers/lpython/pull/730#issuecomment-1173968156

czgdp1807 commented 2 years ago

@certik Please merge this as well.

Shaikh-Ubaid commented 2 years ago

With ASR: Sync Libasr !1816(merged) at LFortran and ASR: Sync Libasr #730(merged) at LPython, both the (LFortran and LPython) Libasrs are in sync.

So, I guess, currently, it is a good opportunity to possibly move to this common and independent Libasr project.

@certik Please possibly look into this.

czgdp1807 commented 2 years ago

I think it is easier said than done. Points to be noted,

  1. Ever changing nature of libasr - It is ever changing and not stable, simply because ASR is generated by frontend and if it changes then libasr has to be updated as well. Making it a separate project would always require a branch of it to be used and not a particular release.
  2. End to end testing - Testing a compiler in CI requires changes in libasr to be used. So, every PR in LFortran/LPython would require a different branch of libasr to be used. Writing test scripts would be tricky. It may require the contributor to provide the libasr branch that they want to test with. Also if you want to use main branch of libasr always, then all the PRs of LFortran/LPython would depend on each other because all the PRs will be using the same libasr and all the contributors pushing to add their change to libasr first so that they can get their work done.
  3. Common libasr would intertwine all the LCompilers. A change in one compiler would require every other LCompiler to be updated. So, different teams of different compilers would always depend on each other. A big management issue.

Moving libasr to a separate project would make development of LPython/LFortran harder. Imagine you want to add a feature (say arrays of structs) to LFortran. So first you will add it till ASR in LFortran/LPython and then a separate PR to libasr project for adding the feature in LLVM/C or whatever backend. And then fetching branches in each of the compilers to test if things work end to end would be cumbersome in my opinion.

Personally, I don't have issues with any kind of development pattern. However, new contributors will face a lot of problems with this kind of structure where an important part of the project is somewhere in a different repository.

Bottomline - Syncing libasr from time to time is easier. Like spending a day on syncing libasr is better than struggling on a daily basis with development issues.

certik commented 2 years ago

As @czgdp1807 said, the alternative options make this work for all developers. The current option is only hard on people that need to synchronize from time to time, but if you only care about LPython or LFortran in isolation, there is no pain.

Shaikh-Ubaid commented 2 years ago

The points shared are really great. Thank you so much @czgdp1807. Also, Thank you for the deep thought. I completely agree with the points and now it seems that it would be really challenging to work across LFortran and LPython with a combined Libasr.

I got inspired towards a combined Libasr from the following:

Since, ASR is self-contained and/or complete and/or common, I thought it was also independent and therefore possibly we could develop it in an individual/designated repository (and thus making it truly common and/or independent from the frontend).

But as from your shared points, now, it seems it would be challenging to have ASR as an individual repository.

One of the ideas I had was to use git submodule, that is to have the libasr project as a submodule to lfortran and lpython projects. I tried playing with submodule at https://github.com/Shaikh-Ubaid/LCpp and https://github.com/Shaikh-Ubaid/Libasr_. It seems it could resolve the shared points to some extent but might create other concerns. I am describing the possible ways and concerns below.

Ever changing nature of libasr - It is ever changing and not stable, simply because ASR is generated by frontend and if it changes then libasr has to be updated as well. Making it a separate project would always require a branch of it to be used and not a particular release.

git submodules uses commits to track the version of the submodule (in our case Libasr) being used. So, if we use libasr as a submodule instead of a particular release, updating the libasr version used by the frontend would be like- first pulling in the changes in the libasr and then adding a commit (on the frontend) to use the updated libasr. But again, it seems this is very similar to that of using releases, since, we just have commits (which look like releases) which are being used for versioning.

Since, the version is based on commits, I guess we might not need to maintain different branches.

End to end testing - Testing a compiler in CI requires changes in libasr to be used. So, every PR in LFortran/LPython would require a different branch of libasr to be used. Writing test scripts would be tricky. It may require the contributor to provide the libasr branch that they want to test with.

Every PR in LFortran/LPython need not maintain different branch, they would just maintain different commits of libasr. While pushing the changes the contributor already/by-default provides the commit of the libasr that he/she is using so there would/should be no extra/explicit information to be provided to the test scripts.

Also if you want to use main branch of libasr always, then all the PRs of LFortran/LPython would depend on each other because all the PRs will be using the same libasr and all the contributors pushing to add their change to libasr first so that they can get their work done.

Since the PRs would be using their own specific libasr (differentiated by the commit id), so probably it could cause conflicts. I am not sure how to deal with it.

Common libasr would intertwine all the LCompilers. A change in one compiler would require every other LCompiler to be updated. So, different teams of different compilers would always depend on each other. A big management issue.

I guess, this happens currently as well whenever we sync the libasrs. For example, during handling https://gitlab.com/lfortran/lfortran/-/merge_requests/1816, we updated (maybe temporarily) both the frontends to support variable lower_bound for arrays.

Other concerns that I had:

The current way also works great. It seems there are many challenges involved in an independent libasr project. As @czgdp1807 mentioned, it seems it is better to stay with the current approach.

certik commented 2 years ago

@Shaikh-Ubaid if we use libasr as a separate git repository (whether via git submodules or completely as an external dependency), then we would indeed depend on the exact commit that we need. For example symengine.py depends on an exact commit (or tag) of symengine here: https://github.com/symengine/symengine.py/blob/5716f1127e907aad352868db5921fc55f6780fb7/symengine_version.txt, and this makes it possible for each PR in symengine.py to depend on a different version of symengine, and the CI downloads and builds the correct version and everything is robust and manageable. Git submodules provide a similar mechanism. However, this is not a problem.

The problem is that symengine.py does not modify symengine in the same PR. The workflow there is that we first update symengine separately, get it merged into master, and only then "bump" the version in symengine.py and update it to work.

Now contrast this with how we work in LPython: every change in ASR requires an intricate change in the frontend, and we have to review and test them together. We currently can't test ASR in isolation, since all our tests require the frontend to be there and are tested via it. So if you submit a change to just the libasr repository, there is no way in practice how to ensure the change is correct and works. So the only way is to first create a PR in libasr (but not merge), then depend on this PR/branch commit with a PR to LPython. Then if everything works, before merging, we first merge the PR into libasr, then we merge the PR to LPython. It's doable, but a lot more work, two sets of PRs to manage almost all the time.

Finally, the constant headaches with just testing a PR, which I do all the time locally: now I can't just checkout the branch, now I need to check out two branches in both LPython and libasr: we would write some cmake scripts to correctly find the correct commit in the libasr git repository, possibly even download it for you and manage it. But it would not (by default) pick up any local modifications that you might have done. Etc.

Or with git submodules (if they can be made to depend on a not yet merged PRs; don't know) you have to tackle the submodules, which after using git for 15 years I still struggle with. I know many of LPython users also would struggle with it, it's extra commands you must execute, and things will fail otherwise. I don't recommend to use git submodules.

Now, it has also happened that for a while LPython and LFortran diverged, it was not possible to use them with exactly the same libasr version. That is all right, no problem, I just manually skipped those changes when syncing. With a separate repository we would now have to maintain a special branch/commits.

All kinds of headaches, that we completely avoid. If you only develop LFortran, or only LPython, you don't need to care about any of this, it's a self-contained project.

certik commented 2 years ago

The main issue, which is a real issue, is that symengine.py is a thin wrapper on top of the symengine library, and symengine is well tested, standalone library. In our case, the relationship is inverted: LPython or LFortran is well tested, but libasr is not, and effectively depends on LPython/LFortran, even though technically the dependency is inverted.

This is a sign of something we have to eventually fix: libasr should eventually be a standalone library that stands on its own. We have plans to write ASR automatic tests that generate random ASR and test all the backends; and we can possibly write tests directly in ASR (from the lisp/Clojure like syntax), generated from LPython/LFortran (but then you need to change LPython locally first, to generate new ASR tests), or written manually (a lot of manual work: currently we do ./run_tests -u and update all tests at once). We will eventually get there, and after that we can indeed add all features to libasr first, add tests, make sure everything works, and merge into master. Then we simply update the dependency in LPython/LFortran. I do not recommend doing this now, since we have to focus on delivering LPython and LFortran. Once we deliver, we can focus on the infrastructure of this "internal" library more.

czgdp1807 commented 2 years ago

So, if we use libasr as a submodule instead of a particular release, updating the libasr version used by the frontend would be like- first pulling in the changes in the libasr and then adding a commit (on the frontend) to use the updated libasr

Well, end to end testing on CI will still be tricky. I have dealt with submodules in SciPy and they are not that easy to deal with. Even with very less frequent updates we still struggle. So imagine life with libasr which is changing everyday. And basically using a branch is same as using commit, no differences.

All in all, I am happy to struggle once in a month (or few weeks) to sync libasr instead of struggling daily with submodules. May be let's close this discussion for now. We can get back to it once LPython and LFortran are in a done state. There are several high priority things that need to be dealt with. This is extremely low priority for me, at least.

Shaikh-Ubaid commented 2 years ago

Got it. This is very helpful and enlightening. Thank you everyone for the discussion.

certik commented 2 years ago

Sorry I forgot to merge it. libasr is now updated and synced.

rebcabin commented 2 years ago

As @czgdp1807 said, the alternative options make this work for all developers. The current option is only hard on people that need to synchronize from time to time, but if you only care about LPython or LFortran in isolation, there is no pain.

This is exactly what we did with CLR at Microsoft. C# and VB9 (for example) were developed in parallel with their own mini-forks of CLR. Periodically (every 3 months or so) or before releases, every frontend team would feature-freeze and integrate with a merged CLR. That pain period was usually one or two weeks at most.

rebcabin commented 2 years ago

The main issue, which is a real issue, is that symengine.py is a thin wrapper on top of the symengine library, and symengine is well tested, standalone library. In our case, the relationship is inverted: LPython or LFortran is well tested, but libasr is not, and effectively depends on LPython/LFortran, even though technically the dependency is inverted.

This is a sign of something we have to eventually fix: libasr should eventually be a standalone library that stands on its own. We have plans to write ASR automatic tests that generate random ASR and test all the backends; and we can possibly write tests directly in ASR (from the lisp/Clojure like syntax), generated from LPython/LFortran (but then you need to change LPython locally first, to generate new ASR tests), or written manually (a lot of manual work: currently we do ./run_tests -u and update all tests at once). We will eventually get there, and after that we can indeed add all features to libasr first, add tests, make sure everything works, and merge into master. Then we simply update the dependency in LPython/LFortran. I do not recommend doing this now, since we have to focus on delivering LPython and LFortran. Once we deliver, we can focus on the infrastructure of this "internal" library more.

I've started some experiments on automated test generation from the ASDL spec. Will share soon.