lcolladotor / biocthis

Automate package and project setup for Bioconductor packages
https://lcolladotor.github.io/biocthis/
49 stars 16 forks source link

[BUG] Add non standard directories and files to a .gitignore #21

Open lshep opened 3 years ago

lshep commented 3 years ago

We try to have the minimum basic package structure. It seems like in creation at least a dev folder with various templates/functions gets created. This should not be in a cloned copy of the repo and users would be asked to remove / .gitignore when submitting to Bioconductor. When these directories get created can it automatically be added to a .gitignore for a user. I'm not sure if there are other directories and files but noticed it in a recent submission.

lcolladotor commented 3 years ago

Hi Lori,

Thanks for the issue!

I brought this up at https://community-bioc.slack.com/archives/C6MVC96AZ/p1604600559030500 and while there was a discussion, I don't think that we reached a conclusion.

I'll ping that Slack channel again to resume the discussion. The answer might be different for say:

.github/workflows/something.yml
dev/*.R
something.Rproj

although well, Aaron Lun made the point that if it's under .Rbuildignore, it's likely ok to have the (small) files there.

Best, Leo

LiNk-NY commented 3 years ago

FWIW, here is an example where this was done quite nicely: https://stat.ethz.ch/pipermail/bioc-devel/2021-April/017937.html -> https://github.com/jianhong/HMMtBroadPeak/tree/staging

One benefit is a cleaner git history where commits that only change the GHA file are on a different branch.

For folders like dev, those seem to be for developer use and could go in a separate branch or as local files since these are generated from code.

lshep commented 3 years ago

Again from a Bioconductor review point of view too we don't know what files are related to what without looking. So whether or not they are harmless small files, I still came across a non standard directory that I had to take the time to investigate and ask to remove since they are developers files only. I am more stringent still on configuration files as well because as there is more and more discussion about investigating other options for the build system, we don't know how additional files or configuration files could potentially interfere with anything we are investigating. Having a branch that users have all this on, but a clean branch with no extras and just the minimum for Bioconductor still doesn't seem necessarily unreasonable.

lcolladotor commented 3 years ago

GHA on another branch

Thanks Marcel! I didn't know you could have the .github/workflow/something.yml file in another branch. That's good to know =) Although well, I don't think that it really affects Bioconductor if the git history includes commits for these configuration files. That is, the files are tiny and don't make the repos bigger, plus it doesn't trigger new builds in the BBS (unlike the SPB used by the submission system).

Now, if Bioconductor is thinking about switching to using GitHub Actions (GHA) and dropping the BBS, then that'd be a different story I guess. At that point, then using a GHA workflow in a package would only make sense before it gets accepted by Bioconductor. Anyway, I don't think that this is where you were going in your earlier comment.

"pure" git main branches?

Lori, I think that you are basically saying that Bioconductor would like clean/pure main branches in the repos without a .Rbuildignore file. That is, main branches where only the package code is present and nothing has to be ignored by R CMD build or Bioconductor Core / Reviewers.

.Rbuildignore: how about code for deleting the ignored files?

Yet, I think that this "pure" main branch situation is not ideal. I understand that Bioconductor doesn't want files that can be interfering with the build system, and well, maybe another solution is to go through the .Rbuildignore and delete files listed there as part of the Bioconductor builds. That is, clone the git repo on the main branch, find files in .Rbuildignore, delete them, then run R CMD build, etc. This is a similar suggestion to what Aaron Lun @LTLA and Kasper D Hansen @kasperdanielhansen said about .Rbuildignore at https://community-bioc.slack.com/archives/C6MVC96AZ/p1604610165052800, although instead of providing the package tarball we would keep the repos (aka, we keep the advantages Vince Carey mentioned at https://community-bioc.slack.com/archives/C6MVC96AZ/p1604662660055200).

The code for deleting files that are part of .Rbuildignore could be a little R helper function in say BiocCheck or some other main BioC package. The scenario that Martin Morgan mentioned at https://community-bioc.slack.com/archives/C6MVC96AZ/p1604606893040200 where he wants to clone a repo and do things to it (or simply check it out), would involve this extra step of using this helper function to delete .Rbuildignore files.

The "pure" git main branch

If Bioconductor goes down the path of a "pure" git main branch on repos, so only R package files, then we'll have to drop many things or argue why X, or Y exception is ok. Here's just a list of some files I know of and/or use myself:

I do agree with Bioconductor that any files that are part of .Rbuildignore should be small, to avoid burdening the BioC system with large files.

Closing comments

I'm aware that we have different positions and I understand that Bioconductor does a lot of work already. I understand that Bioconductor reviewers and the core don't want to deal with any extra files, which well, I hope that .Rbuildignore enables some solutions. From my point of view, I'd like to use all the new things that others are building that can help make my experience building and/or maintaining my packages smoother and/or more reproducible.

I'm curious to see what the BioC core and/or the TAB will decide on this as Vince @vjcitn mentioned https://community-bioc.slack.com/archives/C6MVC96AZ/p1621476360003400. If you want me to, I'd be happy to present my point of view at such a meeting.

Anyways, I just want to thank you Lori, Marcel and the BioC core and reviewer teams for all that you do. Thanks!!

lwaldron commented 3 years ago

IMO there should be a clear cost:benefit rationale (accounting for costs to contributors, including those without advanced git skills), along with a clearly defined recommended workflow, to justify asking contributors to move convenience tools such as @lcolladotor outlines from the main branch. Otherwise it's hard to understand why things not noted by R CMD CHECK (because they're in .Rbuildignore) should be of concern to Bioconductor, if they have no impact on users of the software. If the cost:benefit analysis does favor elimination of convenience configurations and decorations, perhaps it would make more sense for Bioconductor to use a dedicated devel branch. That would come with its own challenges and might reduce the frequency with which contributors merge to devel and push upstream as opposed to relying on continuous integration on GitHub, but I think it would feel less controlling and easier to understand. I'd have to see the proposed workflows in each scenario to give an opinion on which I think would be the least burden for the most contributors.

lcolladotor commented 1 year ago

It's been nearly 2 years since this conversation started. Maybe we can talk about it at BioC2023? Or maybe the BioC core has an updated policy on this?

vjcitn commented 1 year ago

We should definitely continue this discussion. I have been trying to understand a least-resistance path to having a minimal package image in the main (soon to be named 'devel') branch at git.bioconductor.org, and an "extras" branch in github that has pkgdown and github actions support. It isn't totally straightforward for me because I am still somewhat in the woods with respect to remotes and branches and merges. .Rbuildignore plays a role, and it might be sufficient to just set that file up properly. I have long been under the impression that you can put whatever you like in /inst and thus I would anticipate the /dev and /analysis folders mentioned by @lcolladotor migrating to /inst.

I will try to articulate my approach to 'extras' in an example in the very near future. It has no official status and I would hope that the cloud working group (with its attention to github actions) and the core devs can come to a consensus, let's say during work on 3.18. If it can happen more rapidly, so much the better.

A document articulating the goals of expanding the content permitted or endorsed for Bioconductor packages, reasons for restrictions, and paths towards relaxing these restrictions (which may involve evolution of the operation of the BBS) would be most welcome, and a request for comments period can be opened to help us get input from the developer community. Here is a start on such a document. It is world-commentable, if you want to edit it just make a request.

vjcitn commented 1 year ago

So two submissions that use the 'extras' branch for actions and pkgdown are https://vjcitn.github.io/BiocHail/ and https://vjcitn.github.io/scviR/ ... it is not clear that scviR will build on windows owing to python jax infrastructure. BiocHail action is still running for the first time.

vjcitn commented 1 year ago

An example noted above (BiocHail) seems to give the lie to my attempt. I do not understand how to manage divergent branches. extras should have docs/ and .github, main should not.