r-spatial / discuss

a discussion repository: raise issues, or contribute!
54 stars 12 forks source link

Moving spatial related R packages to r-spatial organization? #11

Closed pat-s closed 7 years ago

pat-s commented 7 years ago

As @tim-salabim said yesterday, he wants to move mapview to r-spatial.

To promote this organization and to have spatial related R packages at one place, we could think over moving more spatial related R packages here?

(feel free to suggest more, this was just a brief start of packages which came to my mind)

Here are some points explaining the advantages of an organization. Famous R related organisations that I know of are rstudio and ropensci.

bhaskarvk commented 7 years ago

My packages (most of which are not on cran, but hoping to find some free time soon)

I would be more than happy to move these over to r-spatial if you guys feel like they belong in r-spatial.

gisma commented 7 years ago

I appreciate the idea to have a starting point or home base for the spatial focused R stuff. It seems to be convenient to find all up to date releases at one place. Nevertheless the question arises for me "What is worthwhile to be called spatial"

In my opinion this is not covered by an more or less loose collection of "spatial" package itself. It is more a specific way to think and solve problems.

So in addition I would also be very interested in developing and providing a kind of open course or training system linking this spatial packages and concepts while addressing real world questions.

There are a lot of good reasons to use R as scripting language - the main reason for me is the low entrance level for students. Nevertheless and probably because my roots are in remote sensing and modeling community I often need to deal with data that is seems to be far beyond the typical R scope. . Therefore I am also interested in developing packages that simplify the usage of all this excellent and mature GIS and CLI tools outside of R.

For both goals I highly appreciate the idea to have a kind of common home. I think r-spatial would be a great place to start.

From my side I would be happy to move the link2GI package which supports the easy integration and partly wrapping of rgrass7 SAGA GIS OTB and some other big ones.

pat-s commented 7 years ago

Which packages should be integrated?

Sure, this is a major point which needs to be discussed. Too many "small" packages would make the organization messy so I would suggest to include only "major" ones for a start.
What are major r-spatial packages? Imho these are the big ones like raster, rgdal, sf + more which set the base for a lot of "extension" packages. I do not know if we should include all "download filetype XY" packages, again referring to the "messy" point mentioned before.

Everybody active here has a coarse feeling what the major r-spatial packages are. If a new one is coming up with a large user base, we could invite the owner to move to r-spatial. I do not think there needs to be a hard threshold using monthly download numbers or something similar.

https://cran.r-project.org/web/views/Spatial.html gives a good starting base for spatial related R packages.

Who should make these decisions?

I would say everybody can suggest invitations to r-spatial but only few selected guys should then decide whether its okay or not. Having a voting of all members would take way too long? In particular I refer to @rsbivand, @edzer + 1 or two more? Of course these guys need to be active here so that we do not face issues related to late responses or similar. These guys would then be the "admins" or how it is called.

pat-s commented 7 years ago

@gisma So your package seems to be somewhat similar to RQGIS? :)

@bhaskarvk This is exactly what needs to be discussed - which packages to integrate and which not. See my previous comment :) I'm not the guy to decide this :yum:

gisma commented 7 years ago

@pat-s not at all ;-) I try to provide an interfacing package that make it more robust to link to the well known wrapper packages like rgrass7 or RQGIS. Actually if you run the QGIS,- GRASS-, SAGA- or whatever wrapper packages you often will have a cumbersome and hard time to adapt all your system and environment settings to get what you want. Same thing on the next OS or next machine when running the scripts or packages there. The most cumbersome situation are business or lab PCs or 20 different installations on individual student laptops in one of the courses... :-(

If you take for instance GRASS7x, each installer and each Windows version will claims for slightly different pathes, settings and so on. Roger's rgrass7 is great but does not really cover this. The user has to organize this manually and it is only possible if there is a lot of system knowledge and admin rights. Even worse with SAGA GIS and RSAGA. The developer of SAGA are permanently changing their API calls as a result RSAGA just support the 2.04-2.27 Versions... Endless story.

To make it short the package provide something like linkGRASS7(x) which basically will automatically setup a full GRASS environment according to your installation and the provided spatial informations of x. It is meant as an easy to use interface to the existing big GIS ones as well as to very specific command line tools like cdo or nck or the orfeo toolbox.

I do not think that is has to be hosted at r-spatial because I am sure that it is only of interest for only a small number of people messing around with a lot of API and CLI calls. Nevertheless I will update it soon on CRAN and if it is a bit more mature it could be of interest for the r-spatial community.

gisma commented 7 years ago

@pat-s just some additional clarification notes. To be honest, in the beginning I did not expect this cool success story of RQGIS - now I think the chosen approach and the current usability of the RQGIS wrapper is simply great!

However keeping in mind that QGIS concept of integrating tons of external tools already provides pretty much GIS/RS/modeling stuff it is (1) still a subset of the capabilities of the contributing software packages and (2) the addressed linking problems are still the same: If you do not have a well configured QGIS including the Processing toolbox and all the providers binaries nothing will work even with the nice to use set_env() ...

I think it is error-prone, bit insular and highly inefficient to avoid the full contribution of all this well designed and mature spatial software stuff.

To make it available for a wider R community we should take some efforts and I think r-spatial would be a good place for doing this.

edzer commented 7 years ago

To organize ourselves, I think we need to do more substantial work than moving repositories.

Although I have no objections against doing so, I've mentioned earlier that I don't know good reasons to move repositories here, except for the hope that someone will find them here. But once we have more than 30 repos, will people take the effort of going through the whole list? Nothing is easier than moving a repo, but will it be found more often? @pat-s : your link points out difference between an orga and a user account, but please tell me what, for us, the real reason is to move all things here.

I think that, as long as we don't have anything better, the primary point for finding spatial packages will be the CRAN spatial task view. From there, you find CRAN packages. I see CRAN as the place for packages of which the developer thinks they're useful and mature enough to not only be used but also to relied upon by others, for their work. From the CRAN package, if the pkg is on GH, you'll find the GH link.

Finding packages that are potentially of use, but not on CRAN, such as several of @bhaskarvk or @mdsumner packages, would be helped by having an index (similar to task view) for these packages here. Writing such an index is valuable work, and much more effective for the purpose of packages being found (and then used) than moving repos here.

Moving a repo here means that you'll have to trust the orga admins, currently @tim-salabim and me, being an admin over it. It separates the package somewhat from the primary author, which may not be what the author wants.

The comparison to the rstudio and ropensci organisations is not about fame: both reflect legal bodies with substantial resources that is being used for package maintenance and development, and that may collect copyrights. We are not such an organisation.

Finally, the maintainers of rgeos, rgdal, spdep, maptools, raster and geosphere (and many other packages) don't use github for their code development.

rsbivand commented 7 years ago

From my point of view, this is not a sensible use of time. Packages hosted on R-Forge under SVN may be hard to follow for those with insufficient experience, but the substantial effort of rebasing them on github (which I sincerely dislike) will not add any functionality. Note that at the point at which some government blocks github, people in places we care about will lose access to source code etc. Not choosing github is not a definition of un-coolness; it may be simply history, and github's day will also pass - at least we need to be aware that it may.

As you note, the task view is there, has been there since forever (also the Spatio-temporal task view); it would be much more helpful to join/assist efforts to help the whole task view infrastructure to scale. So is the mailing list, which is actually where most real interaction with users occurs.

It is important not to gate-keep, and not to curate (certainly not heavy-handedly). Users needs differ enormously, also over time, and in an ecology the packages which fit purposes (that may often not be known to the authors) get used. There are lots opportunities for boosting around, many of them actually lead users to suboptimal choices, and without direct contact it is very hard to guess what may be helpful. Curating assumes intelligent design, which will lead to bitrot with very high probability.

gisma commented 7 years ago

I fully agree with both of you @edzer @rsbivand, we don't need just another software repository and for sure we don't need senseless additional work.

It is true that you will find everything at the CRAN spatial task view. But it is also true that it is cumbersome to do so. Especially for somebody who is not used to this specific R world. To me it seems like surfing along a lot of similar packages and description and it is often hard to understand why to use this or that or even something else and it is almost impossible to differentiate which approach and package is appropriate...

@edzer I am not quite sure if providing an list with useful packages will help a lot. I am not sure but maybe it would help more to review such efforts and give brief and clear hands on examples how to use them. Perhaps even in a blogroll or something similar.

Somehow I am still convinced that it would be very helpful for us and a lot of the users to find a structure that bundles the available knowledge about spatial R stuff in a more effective and transparent way than the CRAN spatial task view, lists or the daily stackoverflow searches does.

edzer commented 7 years ago

rspatial.org receives a lot of hits, and publishes guest entries, after review and editing.

pat-s commented 7 years ago

@gisma Okay I see! That sounds nice and if its a generic framework it could be very valuable for the R community. You should also contact the authors of the packages for which this wrapper applies - so that they mention your package on their repo, preferably with a use case scenario to make things easier.

@edzer

I think we need to do more substantial work than moving repositories

Definitely - the integration of selected packages in r-spatial was just a starting point. Follow-up work would involve a lot of organizational/structural work.

I don't know good reasons to move repositories here, except for the hope that someone will find them here

This is fore sure one of the main reasons - to have one central place (repo) for the most searched packages. This also simplifies install_github() calls if one does not need to remember 10 different users but can just call install_github(r-spatial/xxx).

Nothing is easier than moving a repo, but will it be found more often?

I would say in the long run, yes. If r-spatial has been well acknowledged as the place for r-spatial packages, users would profit from it.

@pat-s : your link points out difference between an orga and a user account, but please tell me what, for us, the real reason is to move all things here.

The link was just to provide a quick starting point for orga discussions. The main points which come to my mind would be

Moving a repo here means that you'll have to trust the orga admins, currently @tim-salabim and me, being an admin over it. It separates the package somewhat from the primary author, which may not be what the author wants.

Sure, this depends on the authors. I mean its no need, just an offer. I guess there is no doubt that you both are totally trustworthy and nobody would question that. In fact I would say that without the participation/support of @edzer and @rsbivand (as maybe the two most known r-spatial persons) this repo is somehow a not really serious project. In the long run there is not only a need for 2-3 admins but also the need to distribute work among all (since everybody is busy with other stuff in work than programming packages).

The comparison to the rstudio and ropensci organisations is not about fame: both reflect legal bodies with substantial resources that is being used for package maintenance and development, and that may collect copyrights. We are not such an organisation.

I do not want to raise a legal body with r-spatial. rstudio and ropensci have different aims. R-spatial could just be a central place of r-spatial packages as a starting point for searches & further development (discuss repo, r-packages themselves, hosting or r-spatial.org).

(breaking this comment here)

pat-s commented 7 years ago

@rsbivand

github (which I sincerely dislike)

Could you elaborate more on this? Curious to reflect on your points on this topic.

Note that at the point at which some government blocks github, people in places we care about will lose access to source code etc. Not choosing github is not a definition of un-coolness; it may be simply history, and github's day will also pass - at least we need to be aware that it may.

I do not fully understand this point. Even if Githubs time will pass and governments will block it or there will be another site being # 1 for package development, the code still exists locally (as it does for packages not hosted on Github)?

As you note, the task view is there, has been there since forever (also the Spatio-temporal task view); it would be much more helpful to join/assist efforts to help the whole task view infrastructure to scale. So is the mailing list, which is actually where most real interaction with users occurs.

The task view will never become obsolete. There will always just be a limited number of r-spatial related packages in this orga (speaking fictional now) and it will never cover any relations between the packages or be subdivided in sections like the task view does it.

Asking the other way round: What is the benefit not hosting an R-package on Github (or Bitbucket etc.) not addressing orgas in particular but hosting in general? Doing so opens the development process for the public, provides a place to open issues and install development versions of packages (if desired). This all is not possible if one keeps their development private.

@gisma

Somehow I am still convinced that it would be very helpful for us and a lot of the users to find a structure that bundles the available knowledge about spatial R stuff in a more effective and transparent way than the CRAN spatial task view, lists or the daily stackoverflow searches does.

I support this point. However, I want to express that the task view and SO searches have different aims than a Github repo/orga. The task views gives logical links between packages and a wide overview for sub-fields of "spatial".
SO is a platform to (hopefully) get a quick answer from thousands of programmers to your coding problem, preferably a rather generic problem than one related to a specific function in a package.
Github is useful to contact the package author (including the patience to wait for an answer) and/or contribute by providing bug reports or useful hints what the user might like to see in further versions. Sometimes the problem can also be solved by just installing the dev version when browsing the NEWS file (which way too few exist in the package world out there imo)

edzer commented 7 years ago

There is a government that blocks github.

Github is in private hands; the page you are looking at now is rendered with server-side closed source software.

paleolimbot commented 7 years ago

I don't actually do much GIS within R, although I'm happy that you've included ggspatial on your list. I think a single organization makes sense if a group of packages are designed to work with eachother (something like the tidyverse family), and I think there is still much work to do on this front. In my work, sp, raster, and rgdal form the basis for anything spatial I try to do in R, and having the source for these available on GitHub in a predictable place (like "rspatial/sp") would make me more likely to build a new package that is consistent with existing usage (and perhaps duplicates less functionality). If accessibility in other countries is an issue, I am sure it could be easily mirrored somewhere else (perhaps even on a commit-by-commit basis using commit hooks).

In terms of selecting packages, I would argue that packages that simply provide functionality (like ggspatial) shouldn't be included (or put in a different category), but providing a predictable source-code location for ones that serve as a building block for other packages (sp, raster, gdal, and many of the others listed above) would save me time when trying to write packages.

Robinlovelace commented 7 years ago

I think it would be good to have an equivalent of the tidyverse for easy install and use of packages that use sf and stars. Check out what I have to say on the matter here: https://github.com/Robinlovelace/geocompr/commit/c31ad20e8d027983d603da9f0702f7fc29d9ca4a

Please people (especially @edzer and @tim-salabim ) let @nowosad and I know if any of this is wrong or becomes out-of-date (trying to write about r-spatial in a future-proof way is not easy since sf - but it's much more fun!).

edzer commented 7 years ago

When writing "Applied Spatial Data Analysis with R", we made good experiences with waiting for the software to have matured before we wrote a book. I share your enthusiasm, but have little appetite for doing things in reverse order.

rsbivand commented 7 years ago

In OSGeo, there is a much more structured approach to incubation and candidate projects. Nobody there starts with the other end, though. Arguably, things here are much more fine-grained than in OSGeo. For example, in teaching in Poznan using sf, we found that geom_sf was hard to use, but tmap suited the students very well, and rendered faster (subjective impression). I think this is because tmap uses grid directly and 'thinks' cartographically - direct support for classInt - rather than ggplot2's assumptions about aesthetics and having to go through ggplot to get to grid. So what should one recommend now, without running usage studies ... I think that usage RCTs might help to sharpen focus on things like this (we didn't do it, but only had base/lattice to consider).

Robinlovelace commented 7 years ago

Just to clarify: RCT refers to randomized controlled trials I believe. Expanding the acronym as it's the first time it's been used in this thread.

I hope that encouraging people to use sf will be beneficial for its development. I will add an issue to add a section on 'contributing to the community' now.

In writing the book I hope we can have a positive impact on open source software development and will continue to log these here: https://github.com/Robinlovelace/geocompr/blob/master/our_impact.md

Hoping to increase the PR:bug ratio in there will also be good but to me that document shows that there can be benefits associated with communicating about things as they develop.

The deadline for the book is November 2018 so some time for things to change/settle and great to have an idea of what may or may not be in the pipeline based on open discussions like this so many thanks for that.

tim-salabim commented 7 years ago

I have created an extra issue re: tidyverse https://github.com/r-spatial/discuss/issues/13