r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.35k stars 299 forks source link

GSoC 2018 #368

Closed etiennebr closed 7 years ago

etiennebr commented 7 years ago

I feel like sf could be a good candidate for GSoC 2018. @edzer, do you have any experience with GSoC?

@tdhock kindly provided a link to a previous call for spatial tasks (similar to what sf achieves). I believe it didn't caught any candidate as the description is more like a wish list than specific tasks. But now that the core of sf is built and as we want to move away from sp, this could make for a well scoped task to translate functions from sp to sf and eliminate dependencies. Also as Toby mentioned, it is possible to find different levels of student autonomy by designing tests appropriately. What do you think?

etiennebr commented 7 years ago

The 2017 page is more informative: https://github.com/rstats-gsoc/gsoc2017/wiki

tdhock commented 7 years ago

Also you may want to get in touch with the mentors listed on that page, Bob Rudis (@hrbrmstr) Bhaskar Karambelkar (@bhaskar_vk) [Totally not a domain expert but glad to help]. Scott Chamberlain (@sckott)

tim-salabim commented 7 years ago

@edzer and me submitted this in 2016 but never got any interested students. So maybe this can be taken as a bad example...

etiennebr commented 7 years ago

So 2016 was a bad year for spatial... We'll need to look at good examples! Any analysis of what makes a great proposal @tdhock?

edzer commented 7 years ago

R package rgeos came out of a GSOC project, done by @rundel, supervised by @rsbivand . @etiennebr , what would you like a student to do with sf?

tdhock commented 7 years ago

for good examples you can look at any of the 35 projects that were accepted this year https://github.com/rstats-gsoc/gsoc2017/wiki/table%20of%20proposed%20coding%20projects -- keep in mind that the sooner you post a project idea to our wiki, the more likely you will be to find a student before the GSOC application deadline (which will be in/around March 2018)

etiennebr commented 7 years ago

Good point about rgeos! I don't have a very defined idea yet. I was thinking about removing dependencies by moving functions to sf (namely to sp); I remember somewhere it's a stated goal of sf. Also, extending utility functions, maybe by looking at e.g. postgis and sqlite available operations? This is more a direction than a defined task yet, and compared to rgeos it seems like a (too?) narrow project.

Another point to consider is that at the speed @edzer you improve sf, these might all become irrelevant tasks by 2018!

edzer commented 7 years ago

I am about to mostly drop sf activity, and focus on stars, and stars/sf interactions.

I'd be interested to supervise scalable sf stuff, similar to how dplyr moves operations to the DB, but then with postGIS (or other spatial databases).

edzer commented 7 years ago

A list of ideas:

Anyone else?

tim-salabim commented 7 years ago

I think one of the most pressing issues for 'converted' users (from sp to sf - like myself) is raster compatibility (e.g. raster::extract()). Though only partially a concern of sf, I think this would be worth thinking about (though this may already be a plan for stars).

edzer commented 7 years ago

Yes, the plan is that during development, stars will depend on sf, and deal with raster and integration stuff.

Right now, stars obviously also depends on gdal; some way, in the long run, all the gdal-related stuff should go into a single package (meaning, maybe stars and sf need to be merged - when reasonably mature).

mdsumner commented 7 years ago

@tim-salabim Currently there is extract for sf - and cellnumbers() - in https://github.com/r-gris/tabularaster

It's not exactly finished but happy to help , and you might be interested if the need is pressing enough. :)

(I needed a tidy version of 'extract(x, y, cellnumbers = TRUE)' and a generalization of 'cellFromThing' for a specific project but it's not going to get much attention for a while again.)

mdsumner commented 7 years ago

@edzer very glad to hear you plan to put gdal in its own package, that will be very valuable also for other projects that aren't simple features based. I am very happy to help on that front - it's probably something that I could actually manage.

mdsumner commented 7 years ago

On the GSOC front I'm hoping to get to a Radian-based linkage for sf. That might attract interest from some circles. I've done the basics for sf and Manifold 8.0 here, but Radian is immensely more powerful and will be used for version 9.0 in the not too distant future.

https://github.com/r-gris/manifoldr

Radian: http://manifold.net/info/radian.shtml

edzer commented 7 years ago

@mdsumner just to manage expectations: I'm not planning to factor out pure-R sf things from sf things that needs gdal/geos/proj.4, or at least have to hear a very good reason before I start thinking about that. My idea is that in the end there will be one package covering sf and stars (raster, vector, library interfaces).

mdsumner commented 7 years ago

I don't understand this properly: " I'm not planning to factor out pure-R sf things from sf things that needs gdal/geos/proj.4,". Can you re-express, please? I think a gdal package that both stars and sf could import/link would be perfect, which is what I thought you said above. If it was on your wish-list I think it would be easy to get support to get it done.

edzer commented 7 years ago

Perfect for what exactly? And who would support getting it done?

mdsumner commented 7 years ago

For projects requiring GDAL that aren't based on simple features, or that don't rely on sf structures. I've spoken with many people who would support it. I haven't pursued it outside of my own tinkerings because I was pretty sure you didn't see the need, but I would rally to support if you were agreeable.

Nowosad commented 7 years ago

@edzer what's the reason to have it all in the one package? I would thought that the best approach (in terms of packages improvement and maintenance) is to have a few packages with specific goal, such as let say sf, stars, and gdal.

edzer commented 7 years ago

@mdsumner I meant support in terms of who would write that software.

For an alternative approach, you can look at rgdal2, which seems inactive; @thk686 mentioned in August last year he was looking at swat bindings to gdal2, but I haven't seen anything that came out of that idea.

sf contains 3400 lines of C++ code that contain just logic to communicate between R and GDAL/GEOS/PROJ. I guess the reason why this is much less than the 7200 lines of C++ ogr code in rgdal and the C code in rgeos is that I chose to use WKB serialization for IO everywhere, and use Rcpp. What I want to say is this: you can interface GDAL, but you need to have an idea what to do with this interface on the R side: R can't do anything with external pointers to C++ structures. If you're impatient for the raster side of GDAL, help get stars going.

Robert thinks that a major bottleneck of raster is that IO, on the C++ level in raster, has to go through R -> rgdal-> C++ -> R-> raster. The LinkingTo interface of R packages is at the SEXP level; anyone who tasted Rcpp doesn't want to go back that alley.

@Nowosad again: what would the specific goal of a gdal R package be?

mdsumner commented 7 years ago

I understand, it's too deep in sf now, I wasn't expecting you to author it. Thanks for the confirmation.

edzer commented 7 years ago

I am inclined to believe that there is no other way to do this that is useful, but would be happy to find out I'm wrong, in particular by a working proof-of-concept.

mdsumner commented 7 years ago

It's got two sides, one is the hope that sf would have a minimal core, useable by other sources of simple features other than GDAL, a "core-sf" that was minimal and could be used just for that formal contract with the standard. The other is a generic gdal API - from which other sf-analogues could be built. I actually thought rgdal2 was probably enough, but I only figured out the raw lists being returned at around the time sf was born. I can go under the hood in rgdal as well, and get closer to what I want now, but only in some limited forms.

The too-short summary is that sf is not modular, it's an all-in super power in one cohesive but inflexible form. You can't pick it apart and recompose with just the pieces you need into other forms. It also is a lot more than the simple features standard, it's really a vision of how a GIS should and should not behave. That's great, but it's also just one view of how the core pieces can be expressed and put together.

I understand the C++ in sf, and I've studied it enough to see how it could be extracted, but it was a bit too tied to the sf classes already and that was before you went WKB, so I do understand how tight the links are. It's very slow for me to work in C++ and I just get swamped by other concerns, especially when the goal seems futile, only possible by going independent. (But, it's late, and this is a challenging topic, and I haven't been pursuing it lately. I apologize for harping on about this again without having a solid case - there continue to be new sides to this that do change how I think).

Also, sf is wonderful for learning how to compose these smaller pieces - there's no problems there, but I wish, collectively, that we could find ways to have smaller modules.

faridcher commented 7 years ago

Among other suggestions by @edzer, maybe:

I also agree with @edzer idea of a unified package to do the general purpose vector AND raster processing. While raster package is a powerful package, (Ithink) the speed of developement is slow. As mentioned here, right now, to use the raster::extract I have to use as(sf,'Spatial') and then st_as_sf() which is not very efficient! Asking Robert to add the sf support is one option but it depends how much he is inclined to do so.

edzer commented 7 years ago

@faridcher yes this is also the gist of #76.

Regarding raster, I think that it is a common misconception that the speed of development is an indicator of the quality of the package. (With the exception of sf of course, the quality of which, measured this way, must go down when it's ready.)

faridcher commented 7 years ago

The sf is mature enough to be supported by raster and other pkgs. Again, raster is a high quality package with least dependencies on other packages, but it should also catch up with recent developments (i.e. sf). For example, I emailed Robert asking why he doesn't migrate raster source to github from r-forge, I haven't received an answer yet! This is why I am emphasizing on 'speed' term.

edzer commented 7 years ago

R core also hasn't moved from svn to git.

faridcher commented 7 years ago

as a new user of R, which one do you recommend to use? sp or sf? (same story with r-forge vs github and svn vs git). ​

On Fri, Jun 2, 2017 at 11:33 AM, Edzer Pebesma notifications@github.com wrote:

R core also hasn't moved from svn to git.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/edzer/sfr/issues/368#issuecomment-305822849, or mute the thread https://github.com/notifications/unsubscribe-auth/AJhSN8eLY8aqSECwFbd6Ac-FQ8f39ZB2ks5sACtdgaJpZM4NqSVF .

edzer commented 7 years ago

We're drifting off, let's keep the discussion to GSoC 2018 suggestions here.

harryprince commented 6 years ago

great idea, playing sf on postgis or hive just like dplyr and sparklyr. @edzer

mdsumner commented 6 years ago

Fyi dbplyr already supports this, issue a query to a dB and get a table with geom - then use sf to convert to sfc, nothing else is needed

mdsumner commented 6 years ago

Uh, of course getting the CRS for the geom will be v different depending on the db ... Only Manifold and PostGIS (EWKB) have geoms that include this metadata in the blob, afaik and it's never stuck as a way to store things - however it's so easy to work around compared to all the other stuff to worry about. Feel free to get in touch if you want more details with a sqlite/gpkg example.

etiennebr commented 6 years ago

I'm very interested in this! I've done some work on the sf interface for databases and it now works pretty well by dbplyr.

However I just started working on spark and hive, but it seems like Magellan's spatial predicates aren't fully supported and not OGC compliant. And then there is this OS framework by ESRI: https://github.com/Esri/spatial-framework-for-hadoop that seems OGC compliant. I'm still new to all this, though.

@mdsumner, any experience with sf, spark and hive?

Let's move this discussion to its own thread.