jasp-stats / jasp-issues

This repository is solely meant for reporting of bugs, feature requests and other issues in JASP.
59 stars 29 forks source link

conda package #1353

Closed izahn closed 3 years ago

izahn commented 3 years ago

I would like to use JASP on a University HPC where flatpak is not available. Toward that end I am working on building a conda-forge package for JASP over in https://github.com/conda-forge/staged-recipes/pull/15529. If there is anyone here who would like to join me in this effort please let me know. I would be delighted to have a co-maintainer for the conda-forge package!

boutinb commented 3 years ago

Hi @izahn It's really great that you want to do this. I have no experience in Conda, so I'm afraid I cannot help you much in writing the scripts. But I see for example that you have to replace the hard coded '/usr/' folder to the '$PREFIX' environment variable. Such things could be done directly in the JASP repository. So if you need that, don't hesitate to ask. Also you need to list the R packages we use. Maybe there is way to do this more dynamically. Maybe has @vandenman some ideas for this?

vandenman commented 3 years ago

Also you need to list the R packages we use. Maybe there is way to do this more dynamically. Maybe has @vandenman some ideas for this?

Perhaps you can borrow that from our flatpak setup. That way, you ensure that you always use the same packages as flatpak. The easy way is to do this is to download the latest version of the flatpak archive from http://static.jasp-stats.org/flatpak_archive.tar.gz.

The slightly more complicated way is to generate this archive yourself. For that, you can use this file https://github.com/vandenman/jasp-desktop/blob/flatpak_changes/Tools/flatpak/setup-rpkgs/R/flatpakGeneratePkgsList.R. That generates a tar.gz archive with all required R packages from CRAN and GitHub. The script assumes that one dependency is already there (V8) but I can forward you the static lib if you'd like (it's a bit big ~120 MiB unzipped, although only 19.9 MiB as a tar.gz).

The flatpak_archive.tar.gz. is designed around installation through renv. Since flatpak does not allow internet access while building, the flatpak_archive.tar.gz. contains a local CRAN repository and a workaround for GitHub packages. There is some code in JASP to make this work on flatpak (e.g., a lot of exists(/app/lib/*){), but we could make this a bit more general so the flatpak_archive could be used outside of flatpak.

I have no experience with conda-forge packages, but I'm happy to help out wherever I can!

izahn commented 3 years ago

Thanks @boutinb and @vandenman , appreciate your support and help! I've invited you to be listed as co-maintainers of the conda package -- I would do the heavy lifting but it would be great to have you there for support.

Thanks for the invitation to work here to make packaging easier e.g. by replacing hard-coded /usr in the JASP code base. I'll take you up on that!

The R package issue is one that I'm working on now. Conda-forge includes a large number of R packages, and I've included the ones that already exist in conda-forge in the JASP package dependencies (see https://github.com/conda-forge/staged-recipes/pull/15529/files#diff-7fbc5cc79345aa7718e0015397ff1faad4327407e467a2558428ca3a6de0960aR92). There are additional R package dependencies that are not yet in conda-forge, and I have submitted most of these to conda-forge staged recipes already (see https://github.com/conda-forge/staged-recipes/pulls?q=is%3Aopen+is%3Apr+author%3Aizahn+created%3A2021-07-08). My goal is to have all R package dependencies in conda and use the conda machinery for resolving those dependencies. In terms of automating this I think we can convert RPackages.json to a yaml list and convert R package names to their conda-forge equivalents (lowercase and prepend r-).

vandenman commented 3 years ago

My goal is to have all R package dependencies in conda and use the conda machinery for resolving those dependencies.

This might be possible and using those would probably speed up the installation process. However, I can think of a couple of things that may or may not be a problem:

  1. JASP internally uses renv to handle dependency management. From 0.15 onward, the different jasp modules (which are essentially R packages) all have their own library. I have no idea how this interacts with the folder structure and permissions for conda packages. If renv can copy the R packages provided by conda to its cache, there shouldn't be any problem though.
  2. JASP uses a couple of GitHub packages. How do those need to be supplied? These should be specific versions, so pulling main/ master is probably not a good idea. Right now we specify the commit, but I couldn't find that in the conda recipe.
  3. The list of required R packages and their versions is not constant across releases. Would this imply that for each conda release we need to possibly update all R packages that JASP uses on conda? I am assuming btw that the list of R packages used by conda stays constant within a release (because if not that would be problematic).

Also, RPackages.json is no longer used for upcoming JASP versions. If you want to build JASP 0.14.1 (or 0.14.3 for Linux) you should use it, but for the upcoming release (planned after the summer holidays) it is no longer complete.

Edit: Oh and in principle, I'd be happy to be listed as a co-maintainer, but before I agree to that I'd like to understand the approach you take to build JASP a bit better. Hopefully, we can ensure that it won't be a lot of work to upgrade from 0.14.1 to 0.15 (because this wasn't so easy for flatpak, unfortunately, and took me quite some time).

izahn commented 3 years ago
  1. JASP internally uses renv to handle dependency management. From 0.15 onward, the different jasp modules (which are essentially R packages) all have their own library. I have no idea how this interacts with the folder structure and permissions for conda packages. If renv can copy the R packages provided by conda to its cache, there shouldn't be any problem though.

Hmm, I'll need to figure out how to handle this. conda and renv overlap almost entirely, the main difference being that conda is not R specific (e.g., you can't in principle install jasp using renv, but you can do so using conda). I don't think jasp modules being R packages should pose any problem, but figuring out how to reconcile renv and conda package management features will take some work I think.

  1. JASP uses a couple of GitHub packages. How do those need to be supplied? These should be specific versions, so pulling main/ master is probably not a good idea. Right now we specify the commit, but I couldn't find that in the conda recipe.

I think we can package these for conda-forge so long as they have a github release. I have not yet made conda packages for these yet. It looks R package dependencies not available on CRAN are currently

  1. The list of required R packages and their versions is not constant across releases. Would this imply that for each conda release we need to possibly update all R packages that JASP uses on conda? I am assuming btw that the list of R packages used by conda stays constant within a release (because if not that would be problematic).

The conda-forge infrastructure largely automates package updates. For R packages built from CRAN updates to CRAN packages trigger a conda-forge bot that submits a PR to the corresponding R package. The same thing happens for packages built from github releases. Generally we can assume that R packages in conda are up to date (and that older R package versions are also available). There are no conda-forge "releases", but packages can specify version dependencies.

I think the tricky part here will be hammering out how to handle package version dependencies in the conda-forge package. I gather that the flatpak build specifies specific R package versions, but I would prefer not to do that for the conda-forge package. Technically we can do it, but it definitely would increase the maintenance burden. How strictly to pin R package dependencies is an issue that will need to be hammered out.

Also, RPackages.json is no longer used for upcoming JASP versions. If you want to build JASP 0.14.1 (or 0.14.3 for Linux) you should use it, but for the upcoming release (planned after the summer holidays) it is no longer complete.

Good to know, thanks!

Edit: Oh and in principle, I'd be happy to be listed as a co-maintainer, but before I agree to that I'd like to understand the approach you take to build JASP a bit better. Hopefully, we can ensure that it won't be a lot of work to upgrade from 0.14.1 to 0.15 (because this wasn't so easy for flatpak, unfortunately, and took me quite some time).

Yeah, totally get that! I think there are some friction points between the conda approach to package management and the renv based approach that jasp uses internally. In truth reconciling these will take some work, a I totally understand if that's not something you want to take on, especially since it sounds like this might get even more difficult with the 0.15 release.

vandenman commented 3 years ago

Hmm, I'll need to figure out how to handle this. conda and renv overlap almost entirely, the main difference being that conda is not R specific

So as long as a few basic packages are available, JASP installs all missing dependencies while building using renv. At that point, the renv cache is also created so that packages can be efficiently reused (a future goal is that individual modules can be updated without updating JASP altogether). If conda allows internet access while building then that should sort itself out.

I think we can package these for conda-forge so long as they have a github release.

If by release, you mean a tag, then no most of these do not. We know all of the maintainers though so I guess we could pressure them to make a release on GitHub (or CRAN).

I think the tricky part here will be hammering out how to handle package version dependencies in the conda-forge package. I gather that the flatpak build specifies specific R package versions

In principle, all R packages are on latest CRAN version. we just can't do install.packages while building flatpak but instead need to provide the tar.gz source packages ourselves.

, but I would prefer not to do that for the conda-forge package. Technically we can do it, but it definitely would increase the maintenance burden. How strictly to pin R package dependencies is an issue that will need to be hammered out.

So I can send you a list of R package versions at any time (I create one anyway every time I make a flatpak build), but I agree that it increases the maintenance burden and it would be better if we could do without one.

izahn commented 3 years ago

So as long as a few basic packages are available, JASP installs all missing dependencies while building using renv. At that point, the renv cache is also created so that packages can be efficiently reused (a future goal is that individual modules can be updated without updating JASP altogether). If conda allows internet access while building then that should sort itself out.

Conda-forge policies discourage external dependencies, and from a practical standpoint I want to use conda-forge R packages and avoid installing a separate set of R packages just for jasp. It sounds like we should be OK if all R dependencies are installed via conda at build time. Does "missing dependency" here mean that a specific version is not installed? For example, I see that jasp 0.14.3 depends on beeswarm 0.2.3 but the latest conda-forge beeswarm is 0.4.0. If beeswarm 0.4.0 is installed at build time will jasp use renv to install version 0.2.3, or will it be happy with 0.4.0?

If by release, you mean a tag, then no most of these do not. We know all of the maintainers though so I guess we could pressure them to make a release on GitHub (or CRAN).

Yes, exactly. I don't think this is a hard-and-fast rule, but conda-forge encourages building from tarballs.

In principle, all R packages are on latest CRAN version. we just can't do install.packages while building flatpak but instead need to provide the tar.gz source packages ourselves.

So I can send you a list of R package versions at any time (I create one anyway every time I make a flatpak build), but I agree that it increases the maintenance burden and it would be better if we could do without one.

I think that wouldn't be too bad actually. When a new jasp version is released I could look up the package version dependencies and update the conda forge package accordingly. This will be easier if those dependencies aren't too strict, e.g., beeswarm >= 0.2.0 instead of beeswarm == 0.2.3.

izahn commented 3 years ago

I also notice that the renv folks have thought about conda integration over in https://github.com/rstudio/renv/issues/80. Maybe worth seeing if the renv devs have any thoughts about the conda/renv integration issues we are discussing here.

izahn commented 3 years ago

Based on the discussion here it seems like there will be some changes required when 0.15 is released. Nevertheless I plan to move forward with packaging 0.14 since all the work for that has already been done. I now have all the R package dependencies packaged in conda-forge, and the JASP conda-forge recipe itself has been updated and re-submitted in https://github.com/conda-forge/staged-recipes/pull/15529. Please comment there if you would like to be listed a conda-forge package maintainer.

izahn commented 3 years ago

JASP 0.14.3 is now available via conda-forge. The repository is at https://github.com/conda-forge/jasp-feedstock. If you are interested in assisting with maintaining the conda-forge package please open an issue or submit a pull request there.

boutinb commented 3 years ago

Thank you really much for your work!