opentripplanner / OpenTripPlanner

An open source multi-modal trip planner
http://www.opentripplanner.org
Other
2.23k stars 1.04k forks source link

Evaluate memory and performance of Java graph search compared to C implementation #2

Closed novalis closed 13 years ago

novalis commented 13 years ago

One more thing to point out about using Java is you can use GeoTools to load data from PostGIS. Loading from PostGIS I believe is one of the requirements for this project. Of course in C you can use OGR just as well. But Java will make it possible to do nice integrations with GeoServer. GeoServer already provides a GUI framework to configure datastores. So with a bit of work you could have people configure their data, with Oracle or ArcSDE or PostGIS or whatever, in GeoServer, and then take just one additional step to designate which is roads, transit, bike routes. And we'd be able to distribute it as an easy plugin to GeoServer. And OpenGeo will be able to lend support to package it up for better GeoServer integration.

novalis commented 13 years ago

--nicholasbs

novalis commented 13 years ago

If you google "c java performance benchmarks" (but please don't), you will find a world of well-intentioned people trying to prove once and for all with a simple benchmark or two that C is faster that Java, or that Java is actually quite fast these days, or to simply find some truth in all the hype. But invariably, their well-intentioned goals quickly break down into a mish-mash of caveats, more questions, and, most commonly, shouting matches with the unwashed masses of the internet claiming their benchmarks are bogus. There certainly isn't much in the way of authoritative "ok, this is really the answer" material out there, but these Wikipedia articles:

http://en.wikipedia.org/wiki/Java_performance

and

http://en.wikipedia.org/wiki/Comparison_of_Java_and_C%2B%2B#Performance

are interesting places to start. Generally, their conclusions are "it depends".

So with all that in mind, let's talk about memory and performance implications of Java versus C for graph search algorithms. Being well-intentioned, I wrote a benchmark that implements Dijkstra's shortest-path algorithm on a simple graph structure and compares the run time for determining the shortest path between a fixed set of origin-destination queries. I attempted to implement the algorithms in C and Java in as close to the same way as possible so that it'd be a direct comparison of the language speed as opposed to a comparison of differences in algorithm performance.

You'll notice that this is not a comparison of the performance of the three trip planners from GraphServer, FivePoints, and OneBusAway. The algorithms used in each system are different enough that I believe it'd be hard to say whether and difference in performance is due to choice of language or choice of algorithm. Thus, my decision to implement the exact same algorithm in both languages.

You can check out the attached source code if you want to see the full details, but the general idea is that I generated a 1000x100 grid of nodes and randomly connected them to make a simulated street network. I then randomly picked 1000 pairs of nodes to serve as trip planner origin-destination queries. With this test data, I compared how long it took the algorithm in both Java and C, with various compiler optimizations applied, to compute all the routes between the full set of trip planner queries. Note that execution time does not include initial setup, such as time spent reading the graph from disk into memory.

The results are as follows. All times are in seconds. I did five runs of each benchmark to show the average execution time on my 2.4 GHz Intel Core 2 Duo !MacBook Pro with Sun Java 1.5.0_19 and gcc 4.0.1:

{{{ Java: No optimization: 40.892, 41.195, 40.886, 41.260, 41.291 JVM -server optimizations: 25.687, 27.876, 26.181, 26.188, 26.123

C: No optimizations: 33.857, 34.194, 34.101, 34.017, 33.933 Compiler -O3 optimizations: 26.365, 26.410, 26.189, 26.387, 26.292 }}} The result is that un-optimized Java is slower than un-optimized C, that each language is significantly faster with optimizations applied, and, most importantly, Java and C have very similar performance characteristics with optimizations applied.

Except ignore all that. Because when I ran the same benchmarks on one of my school workstation running two 3.00GHz Intel Pentium 4 with Fedora Core 7, gcc 4.3.0 and Sun Java 1.6.0_14, the optimized native C code ran 25% faster. Yet on another workstation on another workstation with four 3.40GHz Intel Xeons, the same OS, compiler, and JVM versions, Java came out ahead by 16%. I'm quite sure that if you ran the benchmarks yourself, you'd get different numbers too. Also consider that the implementations for hash tables and linked lists are different for the C benchmark and the Java benchmark. Could that be the difference? Are the two algorithms really exactly the same?

And now my good intentions have lead me to where all those other benchmarks invariably lead: an answer of "it depends".

My conclusion is that, at this point, both modern C compilers and Java JVMs are mature enough that both languages are quite fast and any performance penalties are likely due to factors beyond the language's control. That is to say, most of the major performance bottle-necks are at the file or network level, like waiting for a database query or a HTTP request to return, or just because the coder wrote an inefficient algorithm implementation. By comparison, the differences in performance between C and Java tend to get lost in the noise.

This tends to match one of the conversations I had with Brandon and David at the workshop (Brandon or David: feel free to correct me if this is not an accurate summary). The basic idea was that all three of our routing algorithms seemed plenty fast. The slow part seemed to be pulling data from the database to turn the machine-coded trip result into a narrative appropriate for display to the user. This is not an area where C vs Java is really going to matter.

And yet, with all that said, I would be a strong advocate of implementing the routing engine in Java. While the performance of C vs Java might be "it depends", there are some distinct advantages in choosing Java. Primarily, it's an argument about portability. As discussed at the workshop, a Java implementation is going to be a lot more portable and easier to run out of the box on a variety of different platforms than a native C solution. Adding new functionality (especially dynamic plugin functionality) is less of a pain in Java. In addition, I hope it's not too controversial to argue that it's just easier to program in Java. When compared with Java or Python or Ruby or C#, it's not often that projects stick with C or C++ because it's a joy to program with. They stick with it for performance reasons. I just don't think the performance advantage is strong enough here to make the case for C.

Now the big argument against. GraphServer is a great routing engine that already exists and it's written in C. I've tossed in more than my two cents on this issue, so I will leave it to the rest of the group to decide where to go next.

--bdferris

novalis commented 13 years ago

You'll notice I forgot completely to say anything about memory consumptions. I think the [http://en.wikipedia.org/wiki/Java_performance Wikipedia entry on Java performance] sums it up well:

Java memory usage is heavier than for C or C++, because:

The real kicker is the extra 12 bytes per object instance in Java. For a typical GTFS from a major metro transit system, there might O(1 million) stop time instances, which might mean an extra 12 MB of memory consumed for the trip planner graph if each stop time is modeled as an object. While there are ways of reducing the memory foot print, this is one area where C beats Java.

That said, memory is cheap and 12 MB out of the gigabytes typically installed on a machine these days is not much. Again, a point of discussion for the group at large.

--bdferris

novalis commented 13 years ago

Replying to [comment:2 bdferris]:

Whoa. Thanks for doing such a thorough and thoughtful set of experiments, Brian!

[snip]

My conclusion is that, at this point, both modern C compilers and Java JVMs are mature enough that both languages are quite fast and any performance penalties are likely due to factors beyond the language's control. That is to say, most of the major performance bottle-necks are at the file or network level, like waiting for a database query or a HTTP request to return, or just because the coder wrote an inefficient algorithm implementation. By comparison, the differences in performance between C and Java tend to get lost in the noise.

This tends to match one of the conversations I had with Brandon and David at the workshop (Brandon or David: feel free to correct me if this is not an accurate summary). The basic idea was that all three of our routing algorithms seemed plenty fast. The slow part seemed to be pulling data from the database to turn the machine-coded trip result into a narrative appropriate for display to the user. This is not an area where C vs Java is really going to matter.

This all sounds reasonable and right to me, and seems the most logical conclusion to draw from your tests.

And yet, with all that said, I would be a strong advocate of implementing the routing engine in Java. While the performance of C vs Java might be "it depends", there are some distinct advantages in choosing Java. Primarily, it's an argument about portability. As discussed at the workshop, a Java implementation is going to be a lot more portable and easier to run out of the box on a variety of different platforms than a native C solution. Adding new functionality (especially dynamic plugin functionality) is less of a pain in Java. In addition, I hope it's not too controversial to argue that it's just easier to program in Java. When compared with Java or Python or Ruby or C#, it's not often that projects stick with C or C++ because it's a joy to program with. They stick with it for performance reasons. I just don't think the performance advantage is strong enough here to make the case for C.

The greater portability (and with that, greater ease of deployment) is the strongest point in Java's favor, IMHO. (I also don't think it's particularly controversial to say that Java is easier to program in.)

One other consideration is the number of languages used throughout the project; all other things being equal (and despite my personal affinity for C), I think fewer languages is more desirable.

Now the big argument against. GraphServer is a great routing engine that already exists and it's written in C. I've tossed in more than my two cents on this issue, so I will leave it to the rest of the group to decide where to go next.

This is a big one. Graphserver not only exists and works quite well, it's got a budding community.

--nicholasbs

novalis commented 13 years ago

Replying to [comment:3 bdferris]:

You'll notice I forgot completely to say anything about memory consumptions. I think the [http://en.wikipedia.org/wiki/Java_performance Wikipedia entry on Java performance] sums it up well:

Java memory usage is heavier than for C or C++, because:

  • there is a 12-byte overhead for each object in Java. This means an object containing a single 4-byte integer requires 16 bytes. However, C++ also allocates a 4-byte pointer for every object that declares virtual functions [6].
  • parts of the Java Library must be loaded prior to the program execution (at least the classes that are used "under the hood" by the program) [7]
  • both the Java binary and native recompilations will typically be in memory at once, and
  • the virtual machine itself consumes memory.
  • in Java, a composite object (class A which uses instances of B and C) is created using references to allocated instances of B and C. In C++ the cost of the references can be avoided.

The real kicker is the extra 12 bytes per object instance in Java. For a typical GTFS from a major metro transit system, there might O(1 million) stop time instances, which might mean an extra 12 MB of memory consumed for the trip planner graph if each stop time is modeled as an object. While there are ways of reducing the memory foot print, this is one area where C beats Java.

That said, memory is cheap and 12 MB out of the gigabytes typically installed on a machine these days is not much. Again, a point of discussion for the group at large.

Assuming the graph is shared, this is of course nothing; if it's 12MB/request, then this is a real consideration.

The greater portability/deployment is still a strong selling point for me. Hypothetically, if we went the Java route, do you think it'd make more sense to effectively port Graphserver (while using it as an opportunity to clean up some of sore spots Brandon mentioned), factor out the routing engine from FP or OBA, or do something else?

I'd like to hear Brandon, David and anyone else's thoughts re: C vs. Java.

Also, thanks again Brian for doing so much legwork on this and kicking off the discussion (please excuse the multiple leg-related metaphors :D).

--nicholasbs

novalis commented 13 years ago

Maybe one of us should send a message to the mailing list, since I think only the two of us are currently getting notifications on this ticket? (Correct me if I'm wrong).

--bdferris

novalis commented 13 years ago

You're quite right. I'll send a message pointing folks towards this ticket in a minute.

--nicholasbs

novalis commented 13 years ago

Oops, forgot to add that I looked over the code earlier and agree that it's pretty close to apple-to-apples. I also ran the tests on my machine (2.4GHz Core 2 Duo, 4GB RAM, OS X v. 10.5) and got about 26s for both versions.

--nicholasbs

novalis commented 13 years ago

Good work, Brian. Interesting to see how close C and Java are speed wise in your test. My view is that platform portability and ease of deployment is highly important, if we're to expect transit agencies to consider using the OpenTripPlanner.

--fpurcell

novalis commented 13 years ago

I ported a large portion of the Graphserver core over to Java, and I'm beginning to run some benchmarks on a real-world street graph of the Seattle area.

Test 1 (85th street to beacon hill) Java: 646 ms C: 23 ms (28x faster)

Test 2 (ballard ave to lake city way) Java: 277 ms C: 9 ms (30x faster)

I'll get these benchmarks polished off and send out for peer-review soon, but it's not looking too good for Java. That's too bad, because abstract classes are exactly what Graphserver is all about. In the case of Graphserver, which aims to be a calculating engine for a mission-critical piece of infrastructure, I think speed is nearly the most important consideration. It may be less portable, but it also requires one thirtieth the computational resources. For a piece of software I fully expect to peg servers and require horizontal scaling, that literally means you'd need thirty times as many machines to keep up with the same load. I think that'd cost more than an extra half a day figuring out how to install it.

--bmander

novalis commented 13 years ago

I look forward to seeing the source, because something definitely seems fishy. Say what you will about C vs Java, but 30x speed difference doesn't seem right.

I just ran a quick benchmark on my own walk planner implementation (Java) and managed 44ms (85th street to beacon hill). Kind of an apples-to-oranges comparison, since it's different machines and likely different algorithms, but shows you can compute walk plans reasonably quickly in Java.

--bdferris

novalis commented 13 years ago

You're right; it seems fishy to me too. My fibheap implementation involves a lot of removing objects from Vectors which, on second thought, is probably wasteful.

--bmander

novalis commented 13 years ago

This is your java FibHeap implementation? You might want to check out java.util.PriorityQueue, a similar data structure that's been part of the standard Java library since Java 1.5. I think you'd be hard-pressed to beat its speed, plus it's one less piece of code you have maintain.

--bdferris

novalis commented 13 years ago

A FibHeap has amortized O(1) for insert and decrease_key, whereas a Heap, which I understand java.util.PriorityQueue is based on, has O(log(n)) for the same.

All pedantry aside, it probably doesn't matter - A FibHeap has basically O(log(n)) for those two operations for small average graph degrees (eg: road graphs, average a little under 4), and java.util.PriorityQueue is probably way more reliable.

--bmander

novalis commented 13 years ago

Truly you are right. Queue operations account for about 20% of my execution time at the moment. Maybe I can shave off a few milliseconds with a faster heap implementation.

--bdferris

novalis commented 13 years ago

A 30x speed difference is quite large and I agree with both of you that it seems a little odd. Have you done any profiling to see what code is taking up the brunt of the time? I'd definitely like to take a look at the code.

Thanks for working on this, Brandon.

--nicholasbs

novalis commented 13 years ago

The benchmark code and README is attached. It runs 100 pre-determined (but random) route queries both in a Java port of Graphserver and in Graphserver itself.

On Ubuntu Hardy, on a Centrino Duo; compiled in GCC with no optimizations, Java version java version "1.5.0"; gij (GNU libgcj) version 4.2.4

Java GS: 97611 ms C GS, through Python bindings: 3331 ms

--bmander

novalis commented 13 years ago

{{{ (graphserver)madison:graph_benchmark_bmander nicholasbs$ python main.py 1 53209539 60412683 44ms 2 53197499 53202704 3ms ... 99 53080844 53230475 26ms 100 53197520 53101531 61ms total: 3121

(graphserver)madison:graph_benchmark_bmander nicholasbs$ java Main 1 53209539 60412683 181ms 2 53197499 53202704 4ms ... 99 53080844 53230475 43ms 100 53197520 53101531 113ms total: 7200

(graphserver)madison:graph_benchmark_bmander nicholasbs$ java -version java version "1.5.0_19" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02-304) Java HotSpot(TM) Client VM (build 1.5.0_19-137, mixed mode, sharing) }}}

This is on my 2.4GHz Core 2 Duo with 4GB of RAM running OS X 10.5. I changed nothing in the code and simply followed Brandon's README.

The Java version is again slower than the C version, but for me the difference is about 2x rather than 30x. Anyone know what gives?

I'm going to keep poking around and see if I can figure out what's up.

--nicholasbs

novalis commented 13 years ago

I just ran a couple more tests on a Gentoo VM (Xeon 2.66GHz) and got 1945ms and 2645ms for the C and Java versions, respectively.

C code compiled with gcc 4.1.2. Java version is 1.6.

--nicholasbs

novalis commented 13 years ago

My initial numbers match Nick's:

% python main.py total: 3054 % java Main: total: 6579 % java -server total: 5310

As a point of comparison, I went back recompiled graphserver with -O3, but didn't see any significant change in runtime (it's compiled with -O2 by default).

Brandon, maybe the speed difference is due to your use of GNU's gij vs Nick and I using Sun's implementation. I don't have a reference on this, but my gut feeling is that Sun's VM is way more optimized than the GNU implementation.

Bra

--bdferris

novalis commented 13 years ago

--bdferris

novalis commented 13 years ago

Also for what it's worth, I can get the Java code down to 2087 (faster than C) by switching from FibHeap to the built-in java.util.PriorityQueue.

--bdferris

novalis commented 13 years ago

I've done some experiments with different priority queues and they haven't shown the fibheap priority queue to be especially beneficial for graphs with small average degrees (where the amount of reshuffling of the nodes in the fringe queue is small, resulting in a relatively low number of decreasekey operations, which is where the fibheap excels). If I reimplemented the queue in C-graphserver as a more straightforward priority queue I suspect it should speed up some too.

In any case, it's pretty interesting that in some cases a Java implementaton of graphserver beats the C implementation.

--bmander

novalis commented 13 years ago

I think the trade-off in queue implementations lends support to the idea that, at this point, making the right choice of algorithm has a much bigger impact than the choice of C vs Java.

But just so my motives are clear, I'm 100% rooting for Java in this argument for a number of selfish reasons:

1) OneBusAway is written in Java, so it'd be easier for me to integrate. 2) I'd much rather be programming in Java (or Python or any modern language) than C.

Definitely selfish reasons, because (a) I could still integrate Graphserver into OBA if I put in the time and (b) I can still code C if I need too. So take all my arguments with that grain of salt.

However, the fact remains that I would be much more excited and willing to spend my free time hacking on Graphserver and the OTP routing core if it went the Java route.

Said another way, Brandon I will commit crazy hours to help the Java port of Graphserver if you decide to go that route.

My unselfish reasons still boil down to the portability of Java, the ease of extension, and basically all the bells and whistles that come with a modern, garbage-collected language.

--bdferris

novalis commented 13 years ago

Ah jeez more developers is a pretty good reason to port it over. Could we keep the Python wrappers? I really like the Python wrappers.

Oh! Also!

java Main: 5.5% system python main.py: 1.5% system

What are other people seeing?

--bmander

novalis commented 13 years ago

Ha, Brian, thanks for being upfront :D

To do you all the same favor: As far as my selfish concerns, as I've said I'm personally more of a fan of C and Python than Java, and my C (and definitely my Python) chops are better than my Java right now, having not done a lot of Java dev as recently (though I don't really consider this much of a factor as I imagine it'd be immaterial after a week or so).

I like the idea of keeping the Python wrappers, both selfishly (Python is my favorite language) and unselfishly (it's portable, easy to deploy, and a lot of developers prefer working on it, meaning more people would likely build off of the core if Python bindings were available).

Also, as I recall when I polled everyone at the workshop re: their experience with C, Java, and Python, the number of hands was (I think) Java, Python, and C, in that order (please correct me if I'm misremembering).

--nicholasbs

novalis commented 13 years ago

I'm all for Python bindings, but I admittedly don't know how nicely Java + Python can play well together. Recall that I'm Java > C > Python in terms of ability (nothing against Python... I just don't know it as well), so I'm not sure how a core written in Java would be wrapped with Python. I know I've read about Jython, but that seems to be not exactly what we are looking for.

Do we want to be able simply have results from the routing engine returned as Python objects? Do we want to be able to extend the capabilities of the routing engine with Python? Something in between or something else entirely?

--bdferris

novalis commented 13 years ago

I'd say minimally for Python bindings to be useful we'd want to be able to get results from the routing engine and pass objects back and forth.

In addition to Jython, there's also [http://jpype.sourceforge.net/ JPype]. I don't know a lot about either of these, so I'm going to look into both and speak to people who know more about them to find out what they offer.

--nicholasbs

novalis commented 13 years ago

Jython's a complete re-implementation of Python (i.e., CPython) in Java. The current version implements Python 2.5.

I tested it out and it works surprisingly well. I was able to import Java Graphserver and run the benchmarks Brandon wrote. It was pretty straightforward (the only hiccup was access restrictions; I threw Brandon's code in a com.graphserver package but I couldn't access anything initially since it was all package-private by default. I just had to make the main class public to make it accessible in the module). I got 7670 ms for the benchmarks on my laptop. This is slightly slower than the 7200 ms I get when I run the Java benchmarks on their own.

I also played around with constructing and traversing graphs via the Jython interpreter. Everything pretty much worked as I expected, e.g.

{{{

g = Graph() g.addVertex("NYC") NYC g.addVertex("Boston") Boston g.addEdge("NYC", "Boston", State(4.5)) }}}

Of course, as Brian pointed out, this isn't exactly what we want because it requires people to be running Jython. I don't think this is a deal breaker, but it's far less than ideal.

JPype also allows accessing Java libraries in Python but does so using the JNI. You do this from within Python by calling a method to start the JVM (jpype.startJVM()), making your library calls (e.g., java.lang.System.out.println()) and then shutting down the JVM. It's also less than ideal since it's not quite as easy as being able simply import a first-class Python module.

--nicholasbs

novalis commented 13 years ago

I'd say minimally for Python bindings to be useful we'd want to be able to get results from the routing engine and pass objects back and forth.

If that's a minimum goal, we might consider focusing less on Python (or any specific language) and define a good API + data exchange format (xml vs json vs protocol buffers vs thrift vs whatever) that make it easy to write a wrapper library in the language of choice (Java, Python, Ruby, whatever). This approach kind of bleeds over into ticket:4, but just thought I'd mention it.

Now, when it comes to directly extending the routing engine, that bring us back to the language discussion. Can we imagine a situation where we'd be extending the routing engine in a meaningful way in Python?

--bdferris

novalis commented 13 years ago

We can imagine a situation, yes. Nino Walker committed a feature which allows for edgetype functions written in Python to be called and used in the core routefinder. I understand he uses it all the time. It is, of course, an order of magnitude slower.

One of the big advantages of Java that I can see is that plugin edgetypes will not be an order of magnitude slower.

--bmander

novalis commented 13 years ago

I think using Java and concentrating on defining a solid API and data exchange format is the right way to go. This would allow for real Python wrappers, which from a dev usability standpoint seem much better to me (if I were writing a Python app that wanted to use the routing engine, I'd much prefer to just import a module than worry about installing Jython or using JPype). It wouldn't allow for extending the core in Python, but I think that's a reasonable tradeoff, given that extensions would be in Java (and thus hopefully easier to write than in C but faster than in Python) and going this route would also simplify the project some (fewer dependencies, etc).

Thoughts?

--nicholasbs

novalis commented 13 years ago

--cholmes

novalis commented 13 years ago

Good points, Chris.

Do we have consensus on this decision or do you guys think there's more to work out?

--nicholasbs

novalis commented 13 years ago

I'm not convinced. My reasons:

(1) Java is about half as fast. If the value of a trip planner is directly proportional to its speed, then a trip planner written in Java will always be half as useful as the same thing implemented in C.

(2) Java is confusing. There are JVMs and JREs and SDKs and JDKs. There is a sprawling multi-dimensional versioning scheme including Java SE and J2ME. Somehow Java 1.6 is more recent than Java 5. I have no idea how those Java's fit in with Java 2. Is there a Java 1? There is Sun's java and there are third party Java VMs. Not everyone's system hits the bullseye in this canvas of versions and technologies that your Java program requires, and when you say, "oh well it's in Java just get Java" this is the maze they have to contend with. They say Java eases deployment, but I've never had an easy time with it. On the contrary: I find it slightly more frustrating than setting up native applications.

(3) Java is not open source. There exist open source implementations, but they are woefully inadequate, as our performance testing has revealed. It would be frustrating to write a large open source suite that only runs properly on a proprietary platform.

(4) Java is not "respectable". The canonical example of any heavy-lifting application is typically written in C (or C++). Linux and MySQL PostgreSQL and the Apache Webserver; Python and Ruby and Java are all implemented in C. langpop.com lists C the #1 programming language by a wide variety of metrics, despite being developed in 1972. C is the bridge over the narrowest gap between the realm of transistors and electrons and the realm of human-level conceptual abstraction. If you want people to respect your software, you write it in C.

--==--

I'm tempted by Java. I like classes, abstract classes, class inheritance, interfaces, exceptions. I like the idea that I could plug a visualization engine like Processing directly into the backend with minimal effort. I like the possibility of getting some more developer attention. And I like the idea of portability, even though it's a little more rocky than we sometimes hope. I think I can be convinced, but I'm not yet convinced.

-B

--bmander

novalis commented 13 years ago

Into the breach!

(1) The value of a trip planner is not directly proportional to its speed. At some point, the difference between a 50 ms graph traversal and a 25 ms graph traversal is completely lost on the user. What's worse, CPUs keep getting faster and we humans are slow as ever. Yes, a trip planner that takes 5 seconds to return a result would suck, but we are definitely in the realm of sub-100 ms in our argument of Java vs C. What's more, we've not even addressed the things that I feel will dominate actual execution time: fleshing out the raw trip-planner result with human labels, geometry info, fare information, and all the other actual bits that make the result useful. If there is even one database hit in there, it will likely dominate the execution time by comparison. Most of the tools that will make these high level tasks tractable (caching, database support, etc) are a lot easier to use in high-level languages like Java and Python. It's easy to envision a scenario where a stand-alone box is just doing nothing but running graph traversals as fast as the CPU can churn them out, but it's even easier to image a server that's memory bound, database bound, network bound, etc from all the other crap that goes into serving a trip planning request.

(2) Can you tell me the difference between ANSI C (C89), C99, and work ongoing for C1X? How about dealing with cross-platform issues when supporting Linux, BSD derivatives, and maybe even Windows? These are equally complex, confusing issues, but we deal with them, because that's what we do when developing open source software. Your confusion about Java versioning names is noted, but I think this is more of a problem of letting the marketing department get too involved (see Windows 95,98,2k,ME,XP,Vista,7). Just like people deal with the differences between Python 2.5.x, 2.6.x, and 3.x, so too do developers deal with Java 1.4, 1.5, and 1.6 (and 1.7 is coming!). There are multiple VM implementations. This is not just an issue with Java. I know many people who think gcc is a joke in the performance department and you should really be Intel's Compiler, but for most of us gcc is good enough. For the same reason, most of us use Sun's VM, because it's good enough (in fact, it's great). I understand the openness of SUN's VM has been an issue preventing it from standard inclusion with most Linux distros, but:

(3) Sun is in the process of making their entire VM open source. I think there are a few sub-modules still encumbered by IP and patent issues that the community is working to replace, towards the goal of OpenJDK7 being completely open source. More details: http://www.sun.com/software/opensource/java/faq.jsp

(4) And there's Java in the number 2 spot, forever lost in C's shadow. I would argue C is #1 not despite having been around since 1972, but maybe because it's been around since 1972! What's more, any number of languages (PHP, Javascript, C#) come out ahead of Python at langpop.com, but I think we'd all agree that replacing the Python portions of Graphserver with one of these languages seems like 10 kind of bad ideas.

This is a religious debate, like all good debates about technology choice on the internet. I don't think there is a right answer here. But it's fun to argue.

--bdferris

novalis commented 13 years ago

(1) Whether or not a trip planner's value is directly proportional to its speed (and if so, whether or not a 2x difference in the routing engine will be meaningful given the time taken by other parts of the system), I think we can all agree this decision is ultimately about tradeoffs. It's conceivable -- though admittedly less and less likely as compilers have gotten so good at doing automatic optimizations -- that we could write an ''even faster'' trip planner if we only used assembly. But obviously the tradeoffs in doing so (having to write assembly instead of C, for instance) aren't worth it to us. Similarly, we're not going to write the entire thing in Python even though (IMHO) it'd likely be the most pleasant programming experience because the loss of speed would be unacceptable to us. I'm currently of the mind that the minor performance hit of using Java instead of C is more than offset by the benefits of using Java over C. Faster only = better if all other things are equal; otherwise, it's a subjective judgement about what we think is more important.

(2) This is a fair point. I was just talking with a friend about this, actually. I think Sun's done a pretty bad job with the marketing/branding of the various versions of Java. No matter what path we take we're going to need to put time into documentation, testing, and making the install process as straightforward as possible. I think Java still comes out ahead here, but you're right that the "write-once, run anywhere" mantra is a bit of a myth.

(3) Another good point. As Brian pointed out, this situation has improved quite a lot over the past couple years and is getting better every day. If I thought things weren't going to keep improving, I'd count this as a major point again Java; however, since there's an active community of free software developers (as well as Sun, which has made it pretty clear they're committed to making this happen) working on this, I'm confident that while it's a nuisance now, it will ultimately be a non-issue.

(4) No doubt a lot of great projects use C, but I don't think this means Java's not respectable. Regardless, let's stick to discussing C and Java's merits as they apply to our needs for this project.

Two more points: First, multiple agencies at the Portland workshop mentioned that they'd prefer Java (because they think it'd be easier to deploy and thus convince people to try, because they have in-house Java expertise, etc), and second, a greater number of existing projects and developers are already using Java in this area. Given that we're ultimately aiming to make transportation extremely easy (#1) by building a thriving OSS community and helping to get free software into markets it hasn't so far penetrated, I think these points are pretty compelling.

With all that said, we should really be using Lisp. We'd probably already be finished, and the entire codebase would be under 1,000 lines, most of which would be comprised solely of parentheses :)

--nicholasbs

novalis commented 13 years ago

Hrm? You don't think my hotheaded religious reasoning is persuasive?

Ultimately I think building a trip planner in Java would be a fun and educational experience - I'll add a directory to the Graphserver source tree where work can proceed.

-B

--bmander

novalis commented 13 years ago

Hrm? You don't think my hotheaded religious reasoning is persuasive?

It's certainly just as persuasive as my own hotheaded religious reasoning ;)

As promised, I'm happy to put in cycles on this. Feel free to let me know if there is anything I can do.

--bdferris

novalis commented 13 years ago

I think it's fair to say we've all got a little religion in our reasoning. It keeps things interesting and shows we all care deeply about this stuff (plus it's fun :D).

Now, what editor should we use to write this thing, emacs or vim?

:::ducks:::

--nicholasbs

novalis commented 13 years ago

Ultimately I think building a trip planner in Java would be a fun and educational experience - I'll add a directory to the Graphserver source tree where work can proceed.

To make code sharing easier, should I add the [http://trac-hacks.org/wiki/GitPlugin GitPlugin] to the trac instance here so we use git for OTP?

(Also, one of you two want to do the honors of finally closing this ticket? :D)

--nicholasbs

novalis commented 13 years ago

Yeah, go ahead and add the GitPlugin on.

I just checked in the Java port of Graphserver to the Graphserver project. it's here:

http://github.com/bmander/graphserver/tree/84dbd92a8af4c45eb5da514129880019e34edcc5/jags

If I may be so presumptuous, I'll close the ticket now.

-B

--bmander

novalis commented 13 years ago

--nicholasbs

novalis commented 13 years ago

{{{

!html

}}} This ticket has been referenced in ticket #141:

''...each transit leg. For example, a transit leg from the 14th St to 42nd st on the #2 subway currently looks like this:

{{{ <leg agencyId="MTA NYCT" headsign="To...''

--nicholasbs