osmcode / libosmium

Fast and flexible C++ library for working with OpenStreetMap data.
https://osmcode.org/libosmium/
Boost Software License 1.0
467 stars 112 forks source link

Buffers: Prevent bug-prone iteration, speed up accidentally-quadratic iteration #369

Closed BenWiederhake closed 11 months ago

BenWiederhake commented 11 months ago

This PR:

joto commented 11 months ago

Where did you encounter those nested buffers? The Reader class should only ever return unnested buffers from read(), so the user should never see them?

BenWiederhake commented 11 months ago

Nested buffers occur internally, so this affects the running time even if the user's code doesn't have direct access to them.

Checking for errors makes even more sense in this case. The user shouldn't be able to trigger this issue even if they wanted to, and the checks make sure that no data is lost internally (e.g. by new code).

I personally encountered these because osmium::io::detail::PBFDataBlobDecoder::operator() returns deeply nested buffers, and that's what the random access code uses / will use.

joto commented 11 months ago

There are two issues here which we should not mix up:

  1. Some functions can now throw which couldn't before. And it is somewhat unexpected that a simple function like begin() can throw. So I am not too happy about that. And because this is internal to libosmium anyway, an assert could be a better solution. But this is something I have to look into.
  2. The performance issue. I can not remember why I implemented it the way I did. Probably because it was the simplest way of doing things. As this code is only run on I/O which does a lot more stuff which will swamp any efficiencies gained here. The 1-2% improvement you measured doesn't seem to me to be a big deal. On the other hand the changs are not huge. I am wondering myself now why I coulnd't implement the linked list the other way around so that the next buffer is always at the beginning of the list and not at the end.

In any case these changes are orthogonal to all the other work you are doing, right?

BenWiederhake commented 11 months ago
  1. I take that as a request to change the exceptions into asserts. Sure, I'll do that.
  2. Yes, these changes are orthogonal. I expected the performance impact to be significantly higher when I started. The TEST_CASE("Can quickly handle deeply nested buffer") in test_buffer_nested.cpp proves that this change does speed things up, so I believe it's worth to improve the code in this way.
BenWiederhake commented 11 months ago

Changes since last push:

BenWiederhake commented 11 months ago

Changes since last push summary:

BenWiederhake commented 11 months ago

CI failure seems to be a flake: It fails long before the code of this PR becomes relevant, specifically during package installation.

BenWiederhake commented 11 months ago

Closing because you don't seem to accept any PRs at this time.