Deployment of R binaries and check logs

krlmlr commented 10 years ago

Offered out of the box:

Azure
Amazon S3

sckott commented 10 years ago

Looking at the Appveyor site, not sure there is anyway to deploy to Github, like you mentioned on the README. Do you know of a way?

krlmlr commented 10 years ago

Well, deploying to Git(Hub) would require some manual work. We'd need to encode a GitHub access key so that AppVeyor can read and use it. Perhaps this could be part of r-travis, too.

Deploying to Azure or Amazon S3 would be much easier, I think.

Perhaps more important is the following question: Where to deploy? Same repo, different branch -- perhaps gh-pages? Different repo? One big repo that collects all packages deployed by r-travis? ...

sckott commented 10 years ago

Assuming S3 is best option: A single place for all binaries is a good idea, but would a single S3 account/bucket be able to collect binaries from different users? Or could one collect binaries from many users S3 buckets

Perhaps logs could be stored in a github repo as they are just text.

krlmlr commented 10 years ago

@sckott: I'm still more inclined to deploy to GitHub + GH Pages, similar to http://cran.github.io.

@gaborcsardi: Do you think it's possible to reuse parts of metacran for building a deployment sink at GitHub?

gaborcsardi commented 10 years ago

I am actually in the process of rewriting it completely in R. Yes, I think it is definitely possible, and I can certainly help you with it, but it will take 1-2 weeks for me to get there. The website is actually not even updated any more, just the github repos. The old version was very much tied to CRAN, the new one not so much.

metacran has:

automatic updates for all pacakges from CRAN to github
building and updating a (couchdb) database, with all package metadata
mirroring this in elasticsearch for nice and easy searching
the website that is automatically generated and updated
some non-robust support for CRAN snapshots, tied to R releases. You certainly don't need this.
a lot of bugs, mostly because of the idiosyncrasies of CRAN :(

krlmlr commented 10 years ago

Thanks, this is great. We might need some extra functionality "the other way round":

Hosting binary Windows and OS X packages in a CRAN-like repository -- perhaps this is doable with GitHub Pages
Hosting build and check logs, and perhaps linking to the corresponding Travis/AppVeyor builds
...

I'd also like to have documentation (built by staticdocs) and vignettes (HTML for viewing, and/or PDF for downloading). This could also be used by metacran.

I'm not too sure how to organize this yet. Perhaps the following could work:

There is a "packages" repo that hosts a text file with one package per line, adding to this text file (via pull request) will create the deployment repo on GitHub and give the user write access to it
The deployment repo is also monitored for changes, commits to this repo will start registering the artifacts and building the documentation
Everything is tied together by a repo similar to http://cran.github.io, which is updated automatically

gaborcsardi commented 10 years ago

Hosting binary Windows and OS X packages in a CRAN-like repository -- perhaps this is doable with GitHub Pages

Hosting build and check logs, and perhaps linking to the corresponding Travis/AppVeyor builds

Definitely doable, CRAN is just a static website, so is GH pages. The logic can be in Travis/AppVeyor.

I'd also like to have documentation (built by staticdocs) and vignettes (HTML for viewing, and/or PDF for downloading). This could also be used by metacran.

Also doable in theory, will be tricky in practice, because staticdocs has some rough edges. But at least we will smooth them somewhat. :)

There is a "packages" repo that hosts a text file with one package per line, adding to this text file (via pull request) will create the deployment repo on GitHub and give the user write access to it

The deployment repo is also monitored for changes, commits to this repo will start registering the artifacts and building the documentation

Everything is tied together by a repo similar to https://cran.github.io, which is updated automatically

Hmmm, what is the goal here? Just to provide windows and OSX binaries to people? I would not give people write access, I think it is possible to do this without write access, similarly to travis and appveyor.

krlmlr commented 10 years ago

Hmmm, what is the goal here? Just to provide windows and OSX binaries to people? I would not give people write access, I think it is possible to do this without write access, similarly to travis and appveyor.

The goal is to host the binaries and logs. I imagine the deployment process as follows:

Deployer clones target repo (=deployment sinks)
Applies his changes
Pushes back

If the deployment sink is one big repo, cloning time will be proportional to the total size. So, we need individual repositories (one per package) for scalability -- just as with CRAN@github. In this case, we might as well give the users write access to their deployment sinks. (Of course, we could implement the same with pull requests which are auto-merged by some other Travis run, but why bother?) In the end, this also allows users to customize their deployment sinks.

gaborcsardi commented 10 years ago

@krlmlr I see.

Hmmm. Isn't it better not to duplicate the git trees of the packages? You could do something like this:

User can register a package with a .foobar.yml file in their repo and an API call (this can also be a pull request).
The API call puts a hook on the repo, so that every new push to the repo triggers a build and check (very much like travis and appveyor). (Or we just explain how to put in a hook by hand, and then we don't need credentials to their github repo.)
When the builder is triggered, it builds, and then uploads the check logs, source package and the binaries to the foobar.github.io web page, including updating the CRAN-like repository.

I think this is possible, with some challenges, e.g. the build script need to authenticate to github to upload the stuff, and I am not sure where to put the token. Maybe travis/appveyor has support for this. It is also challenging to do this without actually setting up a server for the logic. It would be nice to have all logic in travis and appveyor.

Might even be possible to just pick up the existing .travis.yml and .appveyor.yml files, so that there is no need for .foobar.yml.

krlmlr commented 10 years ago

I think there should be two repos for each package: One development repo, and one deployment repo. Rationale: If development repo is also used for deployment, a push to the (deployment) repo will also trigger a re-build for the (development) repo. (One "big" repo for deployment doesn't scale well.)

It is a matter of taste if we duplicate the code in the deployment repos -- there are arguments for and against it.

Advantages of duplicating code:

Clear association of artifacts and with their source code
Deployment happens in a branch off the built state, parallel deployment possible and guaranteed to be conflict-free

Disadvantages of duplicating code:

Slightly larger size of the repo
No sequential relationships between consecutive builds
Files .travis.yml and/or appveyor.yml probably already exist, so we cannot easily implement cascaded invocations of Travis-CI and AppVeyor

I thought about active deployment, configured in the corresponding .travis.yml and appveyor.yml files and using a (yet to be implemented) command in travis-tool.sh. No new hooks, just a one-time sign-on needed as I described earlier. This new travis-tool.sh command would also handle registration of the artifacts at foobar.github.io -- this also could be done via CI hook in the deployment repo, but tricky if the "code duplication" option is chosen.

Data can be encoded in both .travis.yml and appveyor.yml in a way that only those services can read it. It should be possible to encode GitHub API and/or SSH keys in this fashion.

gaborcsardi commented 10 years ago

Your plans are much more ambitious than what I would consider as a minimal (but useful) R binary package builder/tester. :)

I would be perfectly happy with Travis/Appveyor support as it is now, with the addition that the built binaries are uploaded somewhere, maybe in a CRAN-like repository. (Check logs you can see on the travis/appveyor website I guess.)

krlmlr commented 10 years ago

Artifacts are hosted at AppVeyor, see e.g. https://ci.appveyor.com/project/krlmlr/r-portable/build/1.0.8/artifacts . So, first step will be to properly define the artifacts so that they are accessible via web site.

gaborcsardi commented 10 years ago

On a second thought, I think this is different enough from metacran, and it deserves it's own code. I/you can certainly use (=copy) some of the code to start, e.g. for generating the website and so on.

Do you want to create a repo for the website?

krlmlr commented 10 years ago

I agree that such a repo should be separate from metacran, but code sharing should be possible -- perhaps in dedicated packages.

TL;DR For this project, I will show how to deploy logs and binary packages to AppVeyor. Everything else should take place in other projects.

I have gained some new insights in the last few days:

Currently, and for the foreseeable future, AppVeyor will host artifacts indefinitely, or at least long enough to matter. (Coming from Travis-CI, I thought it's necessary to push the stuff elsewhere, but AppVeyor really will host what its builds produce.) In addition, it's possible to crontab the builds, so that "fresh" artifacts are always available. For some reason, artifacts that are defined by code currently don't seem to work correctly, but they work reliably when defined in the .yml file.
CI runs can be skipped for both Travis-CI and AppVeyor (and apparently in many other systems) by saying [ci skip] or [skip ci] anywhere in the commit message. This makes it possible for a build bot to push back to the repo without starting a vicious circle.
I've played with Git deployment in the krlmlr/r-portable repo. (Note the README.md file, and its history.) This needs a bit care, but is definitely doable.

Perhaps the main consequence is that we don't seem to need a second repo -- a second branch in the same repo is probably enough. This could allow for a much simpler design that the one that I originally had in mind. Also, I think such a centralized deployment sink is much easier to implement using a small web service (with a GitHub web hook) rather than using GitHub + CI tools:

Caching (don't need to re-fetch everything)
Synchronization and atomicity issues (e.g., how to update the PACKAGES file atomically?)
Control (I can ssh into that machine and don't need to wait for success/failure of a CI build)
Fair use of free tools

Are you aware of crandalf?

I think, the CI's job is to build binaries, store them for later retrieval, and notify the above web service. For AppVeyor, this already seems to be solved by its own artifact hosting -- except for the web service notification which is trivial to implement. Travis-CI offers S3 deployment (requiring one to sign up with Amazon, but such is life) -- deployment to a GitHub branch could be implemented as part of r-travis.

sckott commented 10 years ago

@krlmlr That's awesome that Appveyor will host the artifacts (i.e., source or binary pkgs for our purposes)!

And I just noticed that you can schedule builds to restart at any cron job schedule, 1 point for appveyor over travis - They said its GUI only now, will add to yml later.

gaborcsardi commented 10 years ago

@krlmlr Thanks for the writeup. Hmmm. I do know about crandalf, although I only found it by chance long after I started metacran. I would say that metacran or CRAN@github or whatever I want to call it, will always focus on CRAN, and it just wants to (ultimately) give you a better interface to CRAN.

However, what I had in mind initially, was a system centered around a new modern package manager (call it rpkg), that is backwards compatible, i.e. it can install packages from new-style rpkg repositories and (fall back to) CRAN, maybe through CRAN@github.

So I would say, that there is not a lot of overlap betwen your plan and CRAN@github, but there is a very big overlap with my original long-term plans. :)

But the thing is, this github issue is probably not the best place to discuss plans for world domination. I am wondering what is. :)

FWIW, I reserved the 'rpkg' organization on github, and also the r-pkg.org domain (rpkg.org was taken by Simon Urbanek it seems). I can add you (and anyone interested) to this organization, so that we can draft some vague plan. If you think this is the way to go.

So, getting back to the point, by the question above, I meant that if you want to create a repo for the web page, then I can clone that and add a first draft. If you want help with this. :)

Btw. I think that way we are working on this right now is kind of good, i.e. focusing small, but useful services, that maybe can be put together at some point.

krlmlr commented 10 years ago

@sckott: Yes, that's nice indeed. I just wonder for how long...

@gaborcsardi: Good idea, could you please add me to that organization? Perhaps all those small useful services should be under this organization, too. (Perhaps metacran as well?)

We really need to split the work into small manageable chunks. About the website -- thanks for the offer, I'd get back to you when there is at least a rough draft for the service so that it needs a website in the first place.

krlmlr commented 10 years ago

Upload works by .yml configuration:

artifacts:
  - path: '*.Rcheck\**\*.log'
    name: Logs

  - path: '*.Rcheck\**\*.out'
    name: Logs

  - path: '*.Rcheck\**\*.fail'
    name: Logs

  - path: '*.Rcheck\**\*.Rout'
    name: Logs

  - path: '\*_*.tar.gz'
    name: Bits

  - path: '\*_*.zip'
    name: Bits

Still need to call R CMD INSTALL --build for the .zip.

krlmlr / r-appveyor

Deployment of R binaries and check logs #5