r-spatial / discuss

a discussion repository: raise issues, or contribute!
54 stars 12 forks source link

Mac OSX binaries and Windows binaries that use GDAL/GEOS/PROJ #40

Open edzer opened 4 years ago

edzer commented 4 years ago

I'm trying to bring together several discussions here:

I agree that syncing the binary builds on OSX and Windows in terms of versions and drivers would be marvelous. I have experience with both statically building OSX binaries of sf (the way @s-u does for CRAN) as well as using homebrew, and bad experiences with trying the former while having homebrew libraries installed. I'm hesitant to advice dynamic linking to OSX users who have no clue what "compilation" or "dynamic linking" means, because they will need to recompile/reinstall the R packages when GDAL etc libraries get updated, but probably won't understand why things don't work in the first place when this happens. For those users, statically linked binary R packages seem the best advice.

@jeroen's rwinlib has made it easy for pretty much any user to install dev versions: install rtools and there you go, regardless whether you understand what is going on or not. (It may have led to the large number of CRAN packages linking to GDAL etc). It would be perfect if a static build system with similar simplicity existed for OSX (I don't know if it does). Jeroen pointed me to recipes but I'm not sure what to do there. Can I use that to locally build static builds with library versions I want?

I am confused about:

s-u commented 4 years ago

@edzer Thanks for starting this. As for macOS recipes is how a) you can replicate exactly what the macOS CRAN setup is and b) by issuing PRs you can update any libraries that your package needs which will be reflected on the macOS build server. Alternatively you can just let me know what your package needs and I can make those changes myself. My main problem here is that I don't know which features your packages need and which dependencies are required as you can build GDAL with many different options - I can't tell if something is useful or not.

Recipes are not intended for end-users as users will simply download the package from CRAN which uses the libraries from the recipes. If developers want to re-build a package, they can download the library binaries from https://mac.r-project.org/libs-4/ instead of building them themselves.

The macOS recipes setup pre-dates the Windows one by many years. The Windows system was put in place only very recently for R 4.0.0 and I agree that it would be nice to have some consistency, but those are very different operating systems so it may sound easier than it is.

I was playing with a Jenkins setup, but the R support in Jenkins is terrible, so it pretty much has to be done from scratch. I am still hopeful that I can have a script that can replicate the CRAN setup in Jenkins. As for GitHub actions, those are, unfortunately, hopeless as they don't support the necessary macOS versions, so Jenkins is the only viable path I'm aware of.

s-u commented 4 years ago

I spent most of today trying to work-around all those bugs in the spatial libraries that have to do with static linking. All of them are unrelated to macOS, and I'm documenting my progress on the recipes wiki on known issues in libraries. Especially for GDAL the list is pretty long. That said, the recipes now include GDAL 3.1.1, GEOS 3.8.1, PROJ 6.3.1, NetCDF 4.7.4, HDF5 1.12.0 and HDF4 4.2.15.

I only tested rgdal and MODIS which pass checks and sf which doesn't pass its tests, but compiles (this may be related to other packages). I have not rolled the update out to the production CRAN machine yet, I want to run more package checks, but you can download the library binaries from https://mac.R-project.org/libs-4/ as usual.

mdsumner commented 4 years ago

wow, thank you!

@s-u I'm a little reluctant to ask as I'm out of my depth, but - is the workflow in the first post here (to unpack the 10.13 SDK) not viable to emulate at least High Sierra on CRAN?

https://github.com/r-lib/actions/issues/69

s-u commented 4 years ago

@mdsumner no, it is not viable. We tried it before. It's not as good as it sounds, because SDKs don't work for configure tests as you'll still get run-time behavior of the system, so that's why you really want VMs and not just SDKs. That's why binaries compiled on more recent macOS still don't work and why I said Actions are not an option for this, unfortunately. I don't understand why GH killed them, there is no technical reason, they just simply don't care about macOS, so Jenkins is all that's left.

mdsumner commented 4 years ago

ok, thanks very much

s-u commented 4 years ago

BTW: to clarify, the reason the SDK use was added is sort of the inverse - to avoid using the most recent SDKs which are partially broken. So it makes recent Xcode to not break, but the resulting binaries are not guaranteed to work on High Sierra (Xcode has been always funky - Apple likes to supply newer SDKs than the OS which is really odd). So it helps in some cases, but doesn't solve the task of replicating the CRAN setup - for that you need a 10.13 VM.

mdsumner commented 4 years ago

but, fwiw it's not completely pointless right? I feel pretty good that I can get those CRAN binaries and have a full test pass on 10.15:

https://github.com/hypertidy/vapour/runs/825320263?check_suite_focus=true (it's the first macos entry, the second uses brew)

at least it's teaching me a lot, I wasn't sure about all your configure notes but - all the patches and configure details are now built-in to those binaries, is that accurate?

I have no idea about Jenkins yet, but it's not impossible we have access to these VMs at my work so I will ask around.

s-u commented 4 years ago

@mdsumner no, that's great! I like that the workflow allows to fetch the latest library binaries so, yes, for all intents and purposes it's as close as you can get with Actions. Don't get me wrong, I think it's perfect for testing. What I was referring to was to replicate the CRAN setup exactly to trace any issues that may come from the checks, which is possible only if the CI system allows the use of 10.13 VMs.

mdsumner commented 4 years ago

Nice, understood!

tim-salabim commented 4 years ago

@s-u thanks for all this effort! This is highly appreciated from the Rspatial comunity. I wonder if there could be a more carved out and aligned process for pooling the expertise of the CRAN build maintainers and package developers. As much as CRAN build maintainers cannot be expected to know the ins and outs of all the system libs and which would be a suitable choice for which package, I assume that most package developers will have next to no idea about the build process and what it entails. I think the recipies repo is a good start, but if I understand corrrectly, this is only for MacOS? I'd really like to make an argument for trying to align Mac and Windows builds as close as possible (@jeroen) because it will be hard to explain to users why they can do something in one OS but not the other. This will likely also lift (or better shift) workload from package maintainers (there have been sooo many issues regarding gdal and friends in the Rspatial world - though understandably as they've undergone major breaking changes).

Anyway, this is just a few thoughts from a package maintainer that does not really have experience with any of the steps involved in getting things set up in a usable way (I leave all this up to @edzer via sf basically).

I really appreciate all the work that is being done to make this as painless as possible for all parties involved

mdsumner commented 4 years ago

Just one comment to add to @tim-salabim - totally agree about alignment of library versions, but I'm personally less concerned about alignment of available GDAL drivers on Windows and MacOS - I think it's more important to have as many drivers as possible on each, even if one is missing some the other has. I know that might be controversial, because it implies different capabilities and testing requirements.

In my experience over the years the ability to access a new format (for me) was always instructive and helpful, more important than system consistency - but I can see that might be a topic for discussion ;)

rsbivand commented 4 years ago

@tim-salabim Pain is essential to find pinch-points, like: https://github.com/OSGeo/PROJ/issues/2084 and https://github.com/OSGeo/gdal/issues/2672. We also feed back up the component tree when necessary. For both GDAL and PRØJ, @rouault has been very helpful and responded as quickly as possible, so subsequent releases of PRØJ and GDAL will have changes suggested by CRAN.

tim-salabim commented 4 years ago

@rsbivand of course. I am more thinking about catching these pain points early and having a system to avoid them later, hence my "shift" in brackets. Admittedly, I don't really have an idea about the whole process that is necessary to ensure stable useful builds on CRAN. Just thought my amateur views might be helpful (different angle) and wanted to express my gratitude to all people involved

@mdsumner I agree that having a gdal suite that is as complete as possible would be great. In the end that's where the pwoer of gdal lies right

tim-salabim commented 4 years ago

Oh, and as a sidenote and from a very selfish standpoint, I'd love to see GDAL 3.1.x in both MacOS and Windows binaries, as it will make the mapview experience significantly nicer :-)

pat-s commented 4 years ago

I would like to draw attention to the following points which IMO usually do not get discussed:

  1. Replicating the CRAN 10.13 setup which builds the binary and uses static versions of system libs
  2. Ensuring a robust build process against the most recent stable macOS version with rolling updates of system libs (homebrew)

Most of this thread talks about 1). This is fine as long as people use binaries. However, installation issues from source will not be solved by referring to the CRAN 10.13 setup as nobody will (be able to) replicate this setup. Installing snapshots of system libs is a pain and one cannot simply go back to macOS 10.13.

What I am missing is a robust check system against the most recent macOS version. Most people run this macOS version (or one major version lower) and also have homebrew installed to manage the system libs (which is perfectly fine as I outlined here already).

The default to check against macOS is either locally or via CI, using the latest stable macOS version. It is unfortunate that one is pointed to the CRAN setup if something does not work and CRAN itself does no checks against the most recent stable version. This way, package maintainers stumble across new issues themselves or users have to find out the hard way by opening issues because something does not work. Currently, CRAN essentially forces you to stay two major versions behind on macOS (in theory) because everything else "is not tested". This is not even possible if you buy a new machine because you cannot go back in time.

When giving R courses/consulting R, many people have Macs and I am always have to explain them why R packages are so unstable when it comes to packages that use more than just plain R code. And no, binaries are not the universal solution for every package. This is more of a general issue than a specific one to the r-spatial build process though.

Building universal binaries across major versions of macOS is hard. This is why homebrew builds binaries for different macOS versions. In a dream scenario, this is also what CRAN would do - using the latest stable versions of homebrew formulas. This would be an easy and robust build process but would of course also require more hardware.

CRAN could also make use of GitHub Actions and ensure a proper build process against the most recent macOS system and the latest homebrew libraries. This is how most people use R on macOS and it's a pity that only the lower supported limit (right now 10.13) is tested. It would be better to test against both limits: the lower OS version limit with static library versions for building binaries and a dynamic upper limit against the most recent stable OS and system libs versions.

Usually the r-lib/tidyverse guys or other ambitioned packages maintainers lift all the support for the most recent macOS versions because these usually have a CI test system set up. However, these are rarely heard when it comes to upstream CRAN adjustments and often enough, custom workarounds against pain points in the CRAN setup need to be made.

s-u commented 4 years ago

@pat-s I agree. To make it very clear, those are also two very different tasks. The role of CRAN is to provide "just works" setup for users. The macOS version requirement is there simply to make sure we cover large enough fraction of useRs and is really just a minor detail in a sense. It doesn't mean the libraries have to stay behind, in fact the static builds allow us to move ahead at any time, so if package developers care, they can push the boundary where they think it's best for the users, all they have to do is to talk to us. That part is about making your package available to the users (=release), it is not about development. The whole point of this setup is that it is very stable.

The other question is about CI for package development. I am more than happy to provide my expertise if needed, but they are independent of CRAN operations and just good practice. It is no replacement for CRAN incoming checks, you would still need to pass those on the CRAN setup to submit your release, but it would allow you as a developer to find out what's needed and what you have to communicate to us for your release (if needed). I think there is certainly hope for that setup given the work so far. The main issue with those things is just it's hard to find an active maintainer. As we have seen with Jenkins, you can't take "set and forget" approach as things will break upstream and someone has to deal with them. For CRAN this is well defined, but outside of the R foundation involvement it's not.

pat-s commented 4 years ago

It doesn't mean the libraries have to stay behind, in fact the static builds allow us to move ahead at any time, so if package developers care, they can push the boundary where they think it's best for the users, all they have to do is to talk to us

Yes, that is how it would happen in an ideal world. However, I do not see this happening in practice. Maintainers are happy it "things just work" and usually nobody knows who is responsible for certain formulas or where to file this request. One solution could be a dictionary which would list people having maintenance access to update those static libraries. If the whole process would be hosted on GH (or similar) a PR could make this process quite transparent.

There needs to be more manpower to maintain a whole CI system that aims to cover most CRAN packages. I am not even sure who besides you (and B. Ripley?) is maintaining the macOS CRAN part. I'd wish everything would be more open and "invitation" friendly with a group of selected people having "merge permissions". R for macOS is a good start. But for it to take off, I think there need to be active invitations to people who use and care about CRAN infrastructure on macOS. I'd guess many would like to help out.

But I guess were are going already off-topic here and loosing the r-spatial connection (even though all of the above would also apply to r-spatial problems).

but they are independent of CRAN operations and just good practice. It is no replacement for CRAN incoming checks, you would still need to pass those on the CRAN setup to submit your release, but it would allow you as a developer to find out what's needed and what you have to communicate to us for your release (if needed).

Sure, but "good practice" would help everyone here, users and devs. All the effort going into "bleeding edge" CI tests is to detect issues early. So far this burden is on the developer side and everyone is acting on their own. Some do it, some don't and I have the feeling that we are running around in circles since years without a robust approach ahead how to solve this problem. A joint approach of known devs in the R area which would set up CI guidelines that could be followed by others would be a tremendous contribution for the whole community. And in the end, this would also help CRAN to more safely update the static versions for building the binaries since they were already successfully checked against in this system.


Maybe we should outsource this discussion to the linked GitHub org and discuss there. I do not really like shifting to the mailing list since discussions become quite messy there quickly.


Regarding the initial topic of this thread: I still stay with the opinion to link and test against homebrew-core on macOS as the default since this is what probably 90% of all macOS users will do every day (if they cannot/won't use binaries).

s-u commented 4 years ago

It doesn't mean the libraries have to stay behind, in fact the static builds allow us to move ahead at any time, so if package developers care, they can push the boundary where they think it's best for the users, all they have to do is to talk to us

Yes, that is how it would happen in an ideal world. However, I do not see this happening in practice. Maintainers are happy it "things just work" and usually nobody knows who is responsible for certain formulas or where to file this request. One solution could be a dictionary which would list people having maintenance access to update those static libraries. If the whole process would be hosted on GH (or similar) a PR could make this process quite transparent.

Well, but that's exactly how it is setup - see https://github.com/R-macos/recipes

And anyone using R should be familiar with the format as it's the same as R packages.

There needs to be more manpower to maintain a whole CI system that aims to cover most CRAN packages. I am not even sure who besides you (and B. Ripley?) is maintaining the macOS CRAN part. I'd wish everything would be more open and "invitation" friendly with a group of selected people having "merge permissions". R for macOS is a good start. But for it to take off, I think there need to be active invitations to people who use and care about CRAN infrastructure on macOS. I'd guess many would like to help out.

It is completely open, there is nothing hidden and fortunately there are people that are contributing.

There are also community efforts such as the support for Jenkins and GH Actions.

[...] A joint approach of known devs in the R area which would set up CI guidelines that could be followed by others would be a tremendous contribution for the whole community. And in the end, this would also help CRAN to more safely update the static versions for building the binaries since they were already successfully checked against in this system.

But, again, this has been ongoing for quite some time - just look on GitHub, there are many R packages that use common CI platforms for quite a while, those are de-facto standards.

pat-s commented 4 years ago

But, again, this has been ongoing for quite some time - just look on GitHub, there are many R packages that use common CI platforms for quite a while, those are de-facto standards.

I am aware of these since I am developing/maintaining a platform-agnostic CI DSL for R within ropensci. What I do not understand is: You write here that this is the de-facto standard (which is true) and these efforts all rely on homebrew for system lib installations. On the other hand you question the use of homebrew for R. Also CRAN has apparently no motivation to come up with their own CI check system using homebrew and the latest OS releases. I mean Jenkins is great, no questions, but it is not simple to copy CRANs setup or use it in general (since you need your own hardware).

What prevents CRAN from having at least one runner in the build matrix that checks against the latest r version on the latest OS version using homebrew? Creating runners that use historic versions of OS versions is much more complicated than the task just outlined. Having such a runner could serve as a nice guideline for all the CI approaches out there to really mirror the CRAN check system (rather than just copying parts but relying on common CI solutions which always diverge a bit from the CRAN standard).


Well, but that's exactly how it is setup - see R-macos/recipes

I am aware of it and it is a good start, thanks for this. What I am missing there is (and which answers could maybe be stored in a Wiki/another appropriate CRAN resource):

s-u commented 4 years ago

What I do not understand is: You write here that this is the de-facto standard (which is true) and these efforts all rely on homebrew for system lib installations.

Last time I checked it was the opposite - the first step was to remove Homebrew specifically to not mess up /usr/local for the checks.

But we're running in circles here - I think you're still confusing CRAN and CI. CRAN is providing binaries for R users on macOS, so that is our goal. The main worry of developers was to replicate the setup so that they can trouble-shoot cases where a binary was not available due to a failure. I think we got a solution for that now. CRAN is not a CI service nor a service for Homebrew.

If Homebrew wants to provide a CI service, that's great, but but has nothing to do with CRAN nor R and package macOS releases on CRAN. If someone wants to maintain it, it would be perfectly fine to have R and packages in Homebrew - they already have R there so as I was saying all the time, it's perfectly fine if you want to install R from Homebrew, all the dependent libraries and packages - you could easily build a CI on that if that's what you want.

  • How are changes checked? I see no CI builds checking the validity of the changes.
  • Are changes directly affecting CRAN build processes or is this more like a mirror how CRAN does it behind the scenes?

The CRAN build is maintained from that repo, so I test the PRs on a development CRAN setup and if it passes it is merged and installed on the production VMs.

  • Should maintainers of popular R packages that depend heavily on such libraries be actively informed about the existence of this repo (and similar non-macOS projects) so that more contributions come in and you are not left alone with this?

This setup has been around for 5 years and it has been announced for a long time, so maintainers that care about macOS (which is not a large fraction as I can say from experience) should know about this.

  • How is Prof. Ripley involved in the maintenance of system libs? I don't know much about what he is doing in detail but I often hear things about debatable discussions/fixes/updates from other people/devs. This is actually a very gray area that is discussed quite often and it might be worth to add some transparency about this and other responsibilities in CRAN to appropriate places.

He typically provides help to package authors by sharing his extensive expertise in fixing packages, often providing patches. He also maintains separate check setups that perform tests on additional platforms and extra tools/instrumentation.

mdsumner commented 4 years ago

@s-u there's been some indication that Prof Ripley has artefacts in another repo, but it seems to me that we should treat https://mac.r-project.org/libs-4/ as the only place for macos binary builds, both for current and development.

He mentioned proj 6.3.2 in particular for a specific libsqlite issue to me, but it seems that I can use the 6.3.1 from libs-4/ (or later if/when it becomes available) as the proper current CRAN dev target for static builds.

Is that accurate?