ropensci / rix

Reproducible Data Science environments for R with Nix
https://docs.ropensci.org/rix/
GNU General Public License v3.0
181 stars 15 forks source link

renv2nix issues #368

Open mihem opened 1 day ago

mihem commented 1 day ago

@b-rodrigues Thanks for renv2nix. As i said I think this is super useful so I was excited to test this new feature.

This is more a collection of problems/ideas than a formal bug report.

I think there should be a renv2nix vignette, which give concrete examples. The function help is extensive, but misses examples and often vignettes are easier as a start.

Is the workflow: You open R in the renv environment and call renv2nix(project_path = ".") or should you be outside of the renv enviroment or is this irrelevant?

When i tried it on a toy project it worked fine, although i got the warning message:

### .Rprofile file already exists. You may want to call rix_init(rprofile_action = 'append') manually to ensure correct functioning of your Nix environment. ###

Since renv projects always have a .Rprofile i found this confusing, shouldn't renv2nix then not just do that? What is the problem if this is not done?

When I tried revn2nix on my real project, I got the following warning message (I use Visual Studio Code):

## .Rprofile file already exists. You may want to call rix_init(rprofile_action = 'append') manually to ensure correct functioning of your Nix environment. ###

cannot open the connectionwarning messages from top-level task callback 'vsc.workspace'
Warning message:
In file(con, "w") :
  cannot open file '/tmp/RtmppLc1a1/vscode-R/workspace.json': No such file or directory

What does this vsc. workspace connection problem mean?

However, the main problem was that nix-build failed with:

error:
       … while calling the 'derivationStrict' builtin
         at <nix/derivation-internal.nix>:34:12:
           33|
           34|   strict = derivationStrict drvAttrs;
             |            ^
           35|

       … while evaluating derivation 'nix-shell'
         whose name attribute is located at /nix/store/4wr1ahk4xsis4vf2yy0gcygivc5l6l7x-source/pkgs/stdenv/generic/make-derivation.nix:300:7

       … while evaluating attribute 'buildInputs' of derivation 'nix-shell'
         at /nix/store/4wr1ahk4xsis4vf2yy0gcygivc5l6l7x-source/pkgs/stdenv/generic/make-derivation.nix:347:7:
          346|       depsHostHost                = lib.elemAt (lib.elemAt dependencies 1) 0;
          347|       buildInputs                 = lib.elemAt (lib.elemAt dependencies 1) 1;
             |       ^
          348|       depsTargetTarget            = lib.elemAt (lib.elemAt dependencies 2) 0;

       (stack trace truncated; use '--show-trace' to show the full, detailed trace)

       error: attribute 'datathin' missing
       at /home/mischko/Documents/beruf/forschung/ml_izkf/default.nix:720:11:
          719|           abind
          720|           datathin
             |           ^
          721|           mclust

The corresponding lines in default.nix:

  git_archive_pkgs = [
    (pkgs.rPackages.buildRPackage {
      name = "CSFAtlasTools";
      src = pkgs.fetchgit {
        url = "https://github.com/mihem/CSFAtlasTools";
        rev = "02d485896d383e2a876f0f3bbae7265c017e7e92";
        sha256 = "sha256-q9qBYrGrn96lG5I9xUuWCLw0CSnh7BA5Qs9AAcRtz0E=";
      };
      propagatedBuildInputs = builtins.attrValues {
        inherit (pkgs.rPackages) 
          dplyr
          glue
          readr
          ggplot2
          tidyr
          RColorBrewer
          bestNormalize
          pheatmap
          recipes
          tibble
          viridis
          broom
          WRS2
          Seurat
          abind
          datathin
          mclust
          tune
          yardstick
          ggsignif;
      };
    })

    (pkgs.rPackages.buildRPackage {
      name = "datathin";
      src = pkgs.fetchgit {
        url = "https://github.com/anna-neufeld/datathin";
        rev = "58eb154609365fa7301ea0fa397fbf04dd8c28ed";
        sha256 = "sha256-rtRpwFI+JggX8SwnfH4SPDaMPK2yLhJFTgzvWT+Zll4=";
      };
      propagatedBuildInputs = builtins.attrValues {
        inherit (pkgs.rPackages) 
          VGAM
          knitr
          extraDistr
          mvtnorm;
      };
    })
   ];

  system_packages = builtins.attrValues {
    inherit (pkgs) 
      glibcLocales
      nix
      R;
  };

in

pkgs.mkShell {
  LOCALE_ARCHIVE = if pkgs.system == "x86_64-linux" then "${pkgs.glibcLocales}/lib/locale/locale-archive" else "";
  LANG = "en_US.UTF-8";
   LC_ALL = "en_US.UTF-8";
   LC_TIME = "en_US.UTF-8";
   LC_MONETARY = "en_US.UTF-8";
   LC_PAPER = "en_US.UTF-8";
   LC_MEASUREMENT = "en_US.UTF-8";

  buildInputs = [ git_archive_pkgs rpkgs  system_packages   ];

}

So datathin is not found, but required by CSFAtlasTools (personal package). But there is actually datathin with correct github repo url and rev (why is the commit ID called "RemoteSha" in renv.lock called "rev" here?). So is the problem just the order?

Thanks!

b-rodrigues commented 1 day ago

Many thanks for testing, this is very valuable input!

I think there should be a renv2nix vignette, which give concrete examples. The function help is extensive, but misses examples and often vignettes are easier as a start.

Yes absolutely, I just didn’t want to write one yet until the api would be stabilized. We are still in an dev/exploratory phase.

Is the workflow: You open R in the renv environment and call renv2nix(project_path = ".") or should you be outside of the renv enviroment or is this irrelevant?

No, ideally, you should copy the renv.lock file into a new folder, and call renv2nix() there. This is to avoid the issues you had with the .Rprofile file. Both {renv} and {rix} rely on project-level .Rprofile files to correctly set up projects, so if you use the same folder for both, you will have issues.

When i tried it on a toy project it worked fine, although i got the warning message:

### .Rprofile file already exists. You may want to call rix_init(rprofile_action = 'append') manually to ensure correct functioning of your Nix environment. ###

Since renv projects always have a .Rprofile i found this confusing, shouldn't renv2nix then not just do that? What is the problem if this is not done?

{renv} creates an .Rprofile file to bootstrap {renv} itself, and {rix} uses it to ensure that Nix projects are nicely hermetic, and that no interactions with user-level R library of packages happen. As said before, ideally you should create a separate folder. This will be nicely documented in the vignette and also in renv2nix() documentation.

When I tried revn2nix on my real project, I got the following warning message (I use Visual Studio Code):

## .Rprofile file already exists. You may want to call rix_init(rprofile_action = 'append') manually to ensure correct functioning of your Nix environment. ###

cannot open the connectionwarning messages from top-level task callback 'vsc.workspace'
Warning message:
In file(con, "w") :
  cannot open file '/tmp/RtmppLc1a1/vscode-R/workspace.json': No such file or directory

What does this vsc. workspace connection problem mean?

No idea, as I don’t use VScode. But maybe try in a separate folder, call renv2nix() there, then nix-build, nix-shell and then start VScode from that shell. See if this happens again.

However, the main problem was that nix-build failed with: So datathin is not found, but required by CSFAtlasTools (personal package). But there is actually datathin with correct github repo url and rev

So what happens here is that {datathin} is dependency of {CSFAtlasTools} but it’s on Github. However, the way {rix} is setup, {datathin} will be listed as coming from CRAN:

propagatedBuildInputs = builtins.attrValues {
        inherit (pkgs.rPackages) <- this package set is mirroring CRAN
          datathin;
      };
    })

But renv2nix() then detects that {datathin} comes from github and correctly includes the right code to install it from Github. But still, from the point of view of {CSFAtlasTools}, {datathin} is being included as a dependency that should come from CRAN. This issue is described here: https://docs.ropensci.org/rix/articles/z-advanced-topic-handling-packages-with-remote-dependencies.html

You should rewrite the expression a little (as described in the vignette) to make it work. For now, {rix} doesn’t handle remote dependencies automatically, and I don’t know if it ever will. It is not an easy issue to solve.

(why is the commit ID called "RemoteSha" in renv.lock called "rev" here?).

That’s just internal jargon.

So is the problem just the order?

No, it’s that the way {rix} doesn’t handle packages with remote dependencies automatically. Some manualy rewriting is needed.

mihem commented 19 hours ago

@b-rodrigues Thanks for you quick response.

No, ideally, you should copy the renv.lock file into a new folder, and call renv2nix() there. This is to avoid the issues you had with the .Rprofile file. Both {renv} and {rix} rely on project-level .Rprofile files to correctly set up projects, so if you use the same folder for both, you will have issues.

This makes sense and solves the warning.

But renv2nix() then detects that {datathin} comes from github and correctly includes the right code to install it from Github. But still, from the point of view of {CSFAtlasTools}, {datathin} is being included as a dependency that should come from CRAN. This issue is described here: https://docs.ropensci.org/rix/articles/z-advanced-topic-handling-packages-with-remote-dependencies.html You should rewrite the expression a little (as described in the vignette) to make it work. For now, {rix} doesn’t handle remote dependencies automatically, and I don’t know if it ever will. It is not an easy issue to solve.

Okay. Although you wrote it quite verbosely in the vignette, I still find it hard to understand what to change. The git part of my default.nix looks like this:

  git_archive_pkgs = [
    (pkgs.rPackages.buildRPackage {
      name = "CSFAtlasTools";
      src = pkgs.fetchgit {
        url = "https://github.com/mihem/CSFAtlasTools";
        rev = "02d485896d383e2a876f0f3bbae7265c017e7e92";
        sha256 = "sha256-q9qBYrGrn96lG5I9xUuWCLw0CSnh7BA5Qs9AAcRtz0E=";
      };
      propagatedBuildInputs = builtins.attrValues {
        inherit (pkgs.rPackages) 
          dplyr
          glue
          readr
          ggplot2
          tidyr
          RColorBrewer
          bestNormalize
          pheatmap
          recipes
          tibble
          viridis
          broom
          WRS2
          Seurat
          abind
          datathin
          mclust
          tune
          yardstick
          ggsignif;
      };
    })

    (pkgs.rPackages.buildRPackage {
      name = "datathin";
      src = pkgs.fetchgit {
        url = "https://github.com/anna-neufeld/datathin";
        rev = "58eb154609365fa7301ea0fa397fbf04dd8c28ed";
        sha256 = "sha256-rtRpwFI+JggX8SwnfH4SPDaMPK2yLhJFTgzvWT+Zll4=";
      };
      propagatedBuildInputs = builtins.attrValues {
        inherit (pkgs.rPackages) 
          VGAM
          knitr
          extraDistr
          mvtnorm;
      };
    })
   ];

  system_packages = builtins.attrValues {
    inherit (pkgs) 
      glibcLocales
      nix
      R;
  };

in

pkgs.mkShell {
  LOCALE_ARCHIVE = if pkgs.system == "x86_64-linux" then "${pkgs.glibcLocales}/lib/locale/locale-archive" else "";
  LANG = "en_US.UTF-8";
   LC_ALL = "en_US.UTF-8";
   LC_TIME = "en_US.UTF-8";
   LC_MONETARY = "en_US.UTF-8";
   LC_PAPER = "en_US.UTF-8";
   LC_MEASUREMENT = "en_US.UTF-8";

  buildInputs = [ git_archive_pkgs rpkgs  system_packages   ];

}

Can you help me, how to change to make this work?

I would agree with your conclusion that this is tedious. So i think it seems as something which should be handled by rix. I wouldn't agree with the three reasons, why rix should handle this: 1-2 : It may be a common situation if you use mature packages. But if you use experimental projects this may happend. This is especially true if you create a package for the functions of your analysis (which may rely on packages that are not on CRAN, actually a quite common situation in my field and many of those packages unfortunately never make it to CRAN e..g datathin or https://github.com/bnprks/BPCells ) 3: If there is renv.lock the user already decided, which commit should be used.

b-rodrigues commented 19 hours ago

could you post the default.nix in a gist and post the link here? I'll help you with it

I will rethink this handling of remote dependencies

mihem commented 19 hours ago

thanks

https://gist.github.com/mihem/d23dacdd3cd0015a9e0ae4e971c8cc6d

b-rodrigues commented 18 hours ago

Here is the correct default.nix: https://gist.github.com/mihem/d23dacdd3cd0015a9e0ae4e971c8cc6d?permalink_comment_id=5294035#gistcomment-5294035

search for all instances of datathin to see how it is used. Also, your example was really great, because it included a relatively old version of R, but much more recent packages, which lead to an issue with {unigd}. I’ll try to add some sort of warning if renv2nix() detects packages that are more recent than the version of R included in the renv.lock file.