Closed etiennebr closed 7 years ago
The 2017 page is more informative: https://github.com/rstats-gsoc/gsoc2017/wiki
Also you may want to get in touch with the mentors listed on that page, Bob Rudis (@hrbrmstr) Bhaskar Karambelkar (@bhaskar_vk) [Totally not a domain expert but glad to help]. Scott Chamberlain (@sckott)
@edzer and me submitted this in 2016 but never got any interested students. So maybe this can be taken as a bad example...
So 2016 was a bad year for spatial... We'll need to look at good examples! Any analysis of what makes a great proposal @tdhock?
R package rgeos
came out of a GSOC project, done by @rundel, supervised by @rsbivand . @etiennebr , what would you like a student to do with sf
?
for good examples you can look at any of the 35 projects that were accepted this year https://github.com/rstats-gsoc/gsoc2017/wiki/table%20of%20proposed%20coding%20projects -- keep in mind that the sooner you post a project idea to our wiki, the more likely you will be to find a student before the GSOC application deadline (which will be in/around March 2018)
Good point about rgeos! I don't have a very defined idea yet. I was thinking about removing dependencies by moving functions to sf
(namely to sp
); I remember somewhere it's a stated goal of sf
. Also, extending utility functions, maybe by looking at e.g. postgis and sqlite available operations? This is more a direction than a defined task yet, and compared to rgeos
it seems like a (too?) narrow project.
Another point to consider is that at the speed @edzer you improve sf
, these might all become irrelevant tasks by 2018!
I am about to mostly drop sf activity, and focus on stars, and stars/sf interactions.
I'd be interested to supervise scalable sf
stuff, similar to how dplyr
moves operations to the DB, but then with postGIS (or other spatial databases).
A list of ideas:
sf
to CGAL, or SFCGAL, to get more computational geometry algorithms (such as the constrained triangulations @mdsumner )sf
(see #360 )sf
for users who want a GIS experience (whatever that means; see also the spatula 2016 GSoC proposal) Anyone else?
I think one of the most pressing issues for 'converted' users (from sp to sf - like myself) is raster compatibility (e.g. raster::extract()
). Though only partially a concern of sf, I think this would be worth thinking about (though this may already be a plan for stars).
Yes, the plan is that during development, stars
will depend on sf
, and deal with raster and integration stuff.
Right now, stars
obviously also depends on gdal; some way, in the long run, all the gdal-related stuff should go into a single package (meaning, maybe stars and sf need to be merged - when reasonably mature).
@tim-salabim Currently there is extract for sf - and cellnumbers() - in https://github.com/r-gris/tabularaster
It's not exactly finished but happy to help , and you might be interested if the need is pressing enough. :)
(I needed a tidy version of 'extract(x, y, cellnumbers = TRUE)' and a generalization of 'cellFromThing' for a specific project but it's not going to get much attention for a while again.)
@edzer very glad to hear you plan to put gdal in its own package, that will be very valuable also for other projects that aren't simple features based. I am very happy to help on that front - it's probably something that I could actually manage.
On the GSOC front I'm hoping to get to a Radian-based linkage for sf. That might attract interest from some circles. I've done the basics for sf and Manifold 8.0 here, but Radian is immensely more powerful and will be used for version 9.0 in the not too distant future.
@mdsumner just to manage expectations: I'm not planning to factor out pure-R sf things from sf things that needs gdal/geos/proj.4, or at least have to hear a very good reason before I start thinking about that. My idea is that in the end there will be one package covering sf and stars (raster, vector, library interfaces).
I don't understand this properly: " I'm not planning to factor out pure-R sf things from sf things that needs gdal/geos/proj.4,". Can you re-express, please? I think a gdal package that both stars and sf could import/link would be perfect, which is what I thought you said above. If it was on your wish-list I think it would be easy to get support to get it done.
Perfect for what exactly? And who would support getting it done?
For projects requiring GDAL that aren't based on simple features, or that don't rely on sf structures. I've spoken with many people who would support it. I haven't pursued it outside of my own tinkerings because I was pretty sure you didn't see the need, but I would rally to support if you were agreeable.
@edzer what's the reason to have it all in the one package? I would thought that the best approach (in terms of packages improvement and maintenance) is to have a few packages with specific goal, such as let say sf, stars, and gdal.
@mdsumner I meant support in terms of who would write that software.
For an alternative approach, you can look at rgdal2, which seems inactive; @thk686 mentioned in August last year he was looking at swat bindings to gdal2, but I haven't seen anything that came out of that idea.
sf contains 3400 lines of C++ code that contain just logic to communicate between R and GDAL/GEOS/PROJ. I guess the reason why this is much less than the 7200 lines of C++ ogr code in rgdal and the C code in rgeos is that I chose to use WKB serialization for IO everywhere, and use Rcpp. What I want to say is this: you can interface GDAL, but you need to have an idea what to do with this interface on the R side: R can't do anything with external pointers to C++ structures. If you're impatient for the raster side of GDAL, help get stars going.
Robert thinks that a major bottleneck of raster is that IO, on the C++ level in raster, has to go through R -> rgdal-> C++ -> R-> raster. The LinkingTo
interface of R packages is at the SEXP
level; anyone who tasted Rcpp doesn't want to go back that alley.
@Nowosad again: what would the specific goal of a gdal R package be?
I understand, it's too deep in sf now, I wasn't expecting you to author it. Thanks for the confirmation.
I am inclined to believe that there is no other way to do this that is useful, but would be happy to find out I'm wrong, in particular by a working proof-of-concept.
It's got two sides, one is the hope that sf would have a minimal core, useable by other sources of simple features other than GDAL, a "core-sf" that was minimal and could be used just for that formal contract with the standard. The other is a generic gdal API - from which other sf-analogues could be built. I actually thought rgdal2 was probably enough, but I only figured out the raw lists being returned at around the time sf was born. I can go under the hood in rgdal as well, and get closer to what I want now, but only in some limited forms.
The too-short summary is that sf is not modular, it's an all-in super power in one cohesive but inflexible form. You can't pick it apart and recompose with just the pieces you need into other forms. It also is a lot more than the simple features standard, it's really a vision of how a GIS should and should not behave. That's great, but it's also just one view of how the core pieces can be expressed and put together.
I understand the C++ in sf, and I've studied it enough to see how it could be extracted, but it was a bit too tied to the sf classes already and that was before you went WKB, so I do understand how tight the links are. It's very slow for me to work in C++ and I just get swamped by other concerns, especially when the goal seems futile, only possible by going independent. (But, it's late, and this is a challenging topic, and I haven't been pursuing it lately. I apologize for harping on about this again without having a solid case - there continue to be new sides to this that do change how I think).
Also, sf is wonderful for learning how to compose these smaller pieces - there's no problems there, but I wish, collectively, that we could find ways to have smaller modules.
Among other suggestions by @edzer, maybe:
sf
(GEOS) spatial predicates for large data frame. sf
(something like SearchTree which uses quad-tree to do knn query)sf
support for manipulation of large sf object by reference (i.e. not by duplication)I also agree with @edzer idea of a unified package to do the general purpose vector AND raster processing. While raster
package is a powerful package, (Ithink) the speed of developement is slow. As mentioned here, right now, to use the raster::extract
I have to use as(sf,'Spatial')
and then st_as_sf()
which is not very efficient! Asking Robert to add the sf support is one option but it depends how much he is inclined to do so.
@faridcher yes this is also the gist of #76.
Regarding raster, I think that it is a common misconception that the speed of development is an indicator of the quality of the package. (With the exception of sf
of course, the quality of which, measured this way, must go down when it's ready.)
The sf
is mature enough to be supported by raster and other pkgs. Again, raster
is a high quality package with least dependencies on other packages, but it should also catch up with recent developments (i.e. sf). For example, I emailed Robert asking why he doesn't migrate raster source to github from r-forge, I haven't received an answer yet! This is why I am emphasizing on 'speed' term.
R core also hasn't moved from svn to git.
as a new user of R, which one do you recommend to use? sp or sf? (same story with r-forge vs github and svn vs git).
On Fri, Jun 2, 2017 at 11:33 AM, Edzer Pebesma notifications@github.com wrote:
R core also hasn't moved from svn to git.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/edzer/sfr/issues/368#issuecomment-305822849, or mute the thread https://github.com/notifications/unsubscribe-auth/AJhSN8eLY8aqSECwFbd6Ac-FQ8f39ZB2ks5sACtdgaJpZM4NqSVF .
We're drifting off, let's keep the discussion to GSoC 2018 suggestions here.
great idea, playing sf on postgis or hive just like dplyr and sparklyr. @edzer
Fyi dbplyr already supports this, issue a query to a dB and get a table with geom - then use sf to convert to sfc, nothing else is needed
Uh, of course getting the CRS for the geom will be v different depending on the db ... Only Manifold and PostGIS (EWKB) have geoms that include this metadata in the blob, afaik and it's never stuck as a way to store things - however it's so easy to work around compared to all the other stuff to worry about. Feel free to get in touch if you want more details with a sqlite/gpkg example.
I'm very interested in this! I've done some work on the sf
interface for databases and it now works pretty well by dbplyr
.
However I just started working on spark and hive, but it seems like Magellan's spatial predicates aren't fully supported and not OGC compliant. And then there is this OS framework by ESRI: https://github.com/Esri/spatial-framework-for-hadoop that seems OGC compliant. I'm still new to all this, though.
@mdsumner, any experience with sf, spark and hive?
Let's move this discussion to its own thread.
I feel like
sf
could be a good candidate for GSoC 2018. @edzer, do you have any experience with GSoC?@tdhock kindly provided a link to a previous call for spatial tasks (similar to what
sf
achieves). I believe it didn't caught any candidate as the description is more like a wish list than specific tasks. But now that the core ofsf
is built and as we want to move away fromsp
, this could make for a well scoped task to translate functions fromsp
tosf
and eliminate dependencies. Also as Toby mentioned, it is possible to find different levels of student autonomy by designing tests appropriately. What do you think?