tedsteiner / OpenStreetMap.jl

Julia OpenStreetMap Package
Other
52 stars 18 forks source link

Roadmap #62

Open yeesian opened 9 years ago

yeesian commented 9 years ago

I've seen the milestones, and think it's worth laying out what I think is left, for present and future contributors, before we release (the ever elusive) 1.0? Feedback welcome!

  1. OSM Elements: I think we can take our cue from the OSM Concepts adopted by imposm-parser
    • [ ] Provide Support for Relations
    • [ ] Query API (for filtering of osm-elements/tags/etc - previous discussion here)
    • [ ] Data Structures -- my preference is towards reasonable and concrete data types, and to add abstractions only when necessary (some discussion here and here)
  2. Interfaces for interoperability with other Packages

Beyond 1.0, since this has always been an experimental kind of repository, I think it's worth having a look at our python neighbours for inspiration:

I think it makes sense for Buildings/Features/Highways to return DataFrames, with columns corresponding to tags/etc, rather than the existing Dicts that we use.

@kjordahl @jwass @fscottfoti

garborg commented 9 years ago

This looks great.

PBF as another possible format (it would be especially nice right now, with strings being costly, but that should be improved at least in 0.4)?

Speed limits.

tedsteiner commented 9 years ago

I agree these look like good features to add. A couple comments on the features:

In general, I would prefer for OpenStreetMap.jl to be an easy way to parse OSM data and perform basic tasks with it, to serve as a starting point for anyone wanting to work with OSM data and a central hub for OSM-related packages by defining common data representations. Then fancier routing could be performed with an additional package, or detailed map tile rendering with a separate package. But OpenStreetMap.jl will help you get your data loaded, converted into the coordinate system you prefer, cropped to a region and filtered, combined (e.g., add speed limits and elevations), allow you to perform basic capabilities like find the drive time regions or find a driving route, and give you a quick visualization of the results.

I think I have a slightly different view on versioning than others, but in my view, many of the features you mentioned could be released as versions 1.1, 1.2, etc, leading up to version 2.0. The only thing that I really think is worth waiting for on version 1.0 is making sure we have settled on our data formats and function interfaces for the core tasks. I'm not convinced we're there yet, but we're close. Also, I'm not sure whether it's considered to be in poor form to have version 1.0 when Julia hasn't even hit 1.0 yet. But I think that as soon as we hit version 1.0, we will be making a declaration about stability, and more people will be willing to use the package. Basically, I don't want Version 1.0 to be ever-elusive. I also think that once we get the core functionality, the package can probably be mostly "done," and most additional features will probably fit better in add-on packages.

garborg commented 9 years ago

I have to say, I'm not as convinced we're close to 1.0. I'm not saying the interfaces are bad, or that we have to wait until Julia and all the packages we rely on are 1.0 to go there ourselves. Just that:

My hope would be for the roadmap to be the goal and not the number. I don't think forcing the package to have multiple development branches, with new or split out packages having to keep in sync with the more obscure branches rather than release (and be hard to release themselves as a result), would be a good situation, especially with only a few developers.

The conversation around functionality and scope of packages seems great to me, and I think we should flesh out priorities and timelines, just perhaps without 1.0 presumed to fall so early in it. (I'm also not saying robustness and stability and full-featured-ness and improved APIs should be pushed off.)

yeesian commented 9 years ago

Rather than editting the original comment, I think I'll just reply here, so that your replies still make sense to subsequent viewers --

Yeah, I agree with the both of you: let's frame the discussion as goals for subsequent "minor versions" (increment by 0.1), rather than 1.0, then? I think feature releases (that doesn't break/deprecate anything) should actually be released as patches (see the example by JuMP), which we haven't been following so far.. Our previous milestones should really have been patches instead?


Regarding features:


All the stuff on "interoperability with other Packages" is not a priority (to me) at this point, so they might seem like over-engineering at this point. I'll try to explain the motivation for them, using plotting/visualizations as a running example:

I like the default styles we have, but it's not clear to me how to conditionally plot objects (buildings/highways) in a modular/composable way, apart from

I had thought of re-writing it in Compose/Gadfly (as an exercise), and there are immediate benefits, like the ability to export to alot more formats:

PNG, Postscript, PDF, SVG. The SVG backend uses embedded javascript, powered by Snap.svg to add interactivity like panning, zooming, and toggling"

and would be helpful for web mappers. Which made me wonder about maintaining support for Winston, and the possibility of allowing for other plotting engines (if we want to support both Gadfly and Winston, might as well get it right the first time?)

We've gone through the same experience with xml parsing, which is why I think it's worth bringing it up again. The immediate way forward would just to be a lot more careful about reducing the internal coupling of the functions we write with the libraries that we use (which wasn't the case with XML parsing, and isn't the case with plotting). That way, if we move towards a model of allowing extensions to be built upon OSM, it'll become easier to define abstract interfaces for the various extensions to implement.

garborg commented 9 years ago

@yeesian I agree about what you say about decoupling, and what we should think hard about, but I think rather than making interoperability not a priority, it means it's a bigger deal.

Being able to factor things out into packages at will gets more important, as does making sure experimental packages can be based off our master branch.

For example, I think we're on the same page about this, but the last PR would have had to wait until 2.0 if we were at 1.0 right now, in a branch that creates double work to keep up with bugfixes and and features on master, and other functionality developers wanted to be based on the current/future state of the ecosystem rather than the past, would have to interoperate with an unpublished, untagged branch of our package. It also would have been a pain if we weren't willing to deprecate APIs quickly in the name of coherent building blocks.

More relevant, various functionality for updating, subsetting, analyzing geospatial, representing geospatial data will likely move across package boundaries as we experiment with the roadmap items you and Ted mentioned, and updating the interfaces at function and package boundaries as new use cases comes up, seems critical.

Anyway, that doesn't counter anything you said -- I just wanted to bring it up in case is spurs any discussion about belongs in and out of this package, and how to get there with the least friction given we probably don't know yet. Specifically because Ted mentioned not everything belongs inside this package, and you didn't seem to be thinking about complementary packages, though your interfaces approach seems right for it.

Maybe the will be a lot of things too tightly coupled to OSM's specific data format to move them out, and what we'll be moving out are more of the building blocks, like how to draw generic points and ways and features to any backend in a composable way, with an OSM wrapper staying in the package.

P.S. Thanks for linking so heavily in recent issues -- I'm learning about a lot of new projects, and remembering discussions I had forgotten about, thanks to you.

garborg commented 9 years ago

Oh, agreed on releasing more patchlevel versions.

Speed limits exist for a significant minority of roads. I think Ted knows more about elevation, but sounds like it's a very small minority, and people integrate other data sources when they need it?

tedsteiner commented 9 years ago

Versioning

First, I want to say I agree with Sean on versioning. And I'm uncomfortable with going to version 1.0 before Julia does. However, just because we don't have 1000 commits doesn't mean, to me, we are versioning too fast. It just means that the functionality we're providing probably isn't as complex. And I think the interfaces need time to settle.

I think that the versioning so far has been accurate, and I disagree that the releases so far should have been patches. I've been going off the Semantic Versioning Guidelines, which was suggested in my original Metadata pull request. It looks like JuMP is following this, as well, but maybe we should also have a news page at some point. I agree we should frame the milestones in terms of minor versions. I think we should push out patches when necessary, but never as a milestone. I would love to nail down the core API and release version 1.0, but I think that's a way off. But in my mind, the only thing required of version 1.0 is stabilizing the API, not any additional features.

@garborg I see you posted as I was writing. But I definitely get what you mean about the multiple branches, and I don't want to have to keep up with that. All I really meant was that if we know for certain we can lock down the API, then I think we're ready for version 1.0. But we're a ways away from knowing that yet. But if Julia, LibExpat, Winston, and Graphs all bumped to version 1.0 tomorrow, I'd suggest we focus more on API stabilization than we are currently and work towards version 1.0.

Additional Features

I like that Julia packages tend to be focused, and the awesome repository system makes it easy to have dependencies. I think we could basically have "modules" that exist as separate packages for specific tasks, and link to them all from the OpenStreetMap.jl main page, or Geodesy.jl. I think that anything requiring additional source dependencies should be modulized, so the average user doesn't have to worry about compilation issues, etc.

Map Plotting

Other

To put my point of view in a little more context, I'm very busy right now trying to hopefully graduate sometime this year, and hopefully sooner rather than later. Right now this package does everything that I personally need it to do for my work, so while I focus on writing a thesis I probably won't be adding any additional "extraneous" features (from the point of view of my own work). I obviously am quite attached to the package and will keep working on it and trying to make it the best that it can be, especially after graduation, but I also think that there will be a limit to how many features need to be added by us to make this package worthwhile to the larger community, and I'm not all that interested in surpassing that limit.

I'm absolutely thrilled that you guys have been continuing to add features, and you shouldn't worry about breaking compatibility for me, etc., since I can always just pin a specific version. I don't say it enough, but thanks for all your help. All your code speed ups and improvements have really sped up my work for me and helped me learn Julia much better than I otherwise would have, and have also turned this package into something that's useful for a much wider audience.

garborg commented 9 years ago

That all sounds good.

Versioning: Pre 1.0, semantic versioning leaves the meaning of minor and patch levels up to the developer. I see some Julia packages tagging patchlevel versions for minor increases in functionality, bug fixes, maybe minor breaks in compatibility, etc., and saving the minor versions for when enough of the major items have been ticked off, but it's ad hoc and certainly not a rule, and we don't necessarily have to tag that often because our user group is up to date with current development on master.

We could probably tag more patches after bugfixes to be a little friendlier to outsiders, or to give us versions to pin that are without previous bugs and without later compatibility breaks? For the latter, manual git checkouts work, too, and there's DeclarativePackages.jl.

You're welcome for the help -- thanks, first for releasing the package and for being so open to contributions! It has been a great intro to geospatial work for me, and contributing has been helping me become a better programmer, too.

tedsteiner commented 9 years ago

@yeesian @garborg

I haven't made any changes to this package in a while, but we have two big changes that still haven't made it into a release: XML streaming and moving the coordinate systems into Geodesy.jl. I've been quiet lately because I've been working on my thesis (and I also just don't have anything else I needed to add), but I think it would be good to get those changes into a release for others to use (if anyone else is using the package).

For my thesis, I'd like to give the release number that I used to generate my results. Does anyone have any objections to me pushing a new release or any changes that are about to be committed? I had wanted to wait until we figured out the Travis testing issues, but those don't seem to have worked themselves out in the last couple of months like I had hoped.

yeesian commented 9 years ago

I'm okay with that! I have friends from my office who might be using this package for their own work as well, so it'll be great to have a release number.

garborg commented 9 years ago

@tedsteiner I am, too. If you're not in a hurry, I can put the package through the motions on 0.4 tonight, and try to clear up any compatibility issues (or give an ETA), but don't let that hold you up if you want to get something out the door.

tedsteiner commented 9 years ago

Nope, I'm not in a big hurry, I'd just like to push out a release sometime in the next week or so. Thanks, guys!

garborg commented 9 years ago

No problem. I ran into an issue running using the package 0.4 (amitmurthy/LibExpat.jl#30), but it just requires a naming decision, so it shouldn't take long to resolve.

yeesian commented 9 years ago

Perhaps we should push for the release soon, in lieu of #70?

tedsteiner commented 9 years ago

Yes, I definitely agree. I should be able to get to it later this week (my thesis defense is this afternoon, so after today I should have more time).

@garborg Do you happen to know if LibExpat's naming decision has been resolved yet?

garborg commented 9 years ago

Nothing yet, just pinged Amit.