openstreetmap / operations

OSMF Operations Working Group issue tracking
https://operations.osmfoundation.org/
98 stars 13 forks source link

CGImap testing on dev instance #236

Closed mmd-osm closed 5 years ago

mmd-osm commented 6 years ago

This is a bit of a follow up for https://github.com/zerebubuth/openstreetmap-cgimap/pull/155, and maybe later https://github.com/zerebubuth/openstreetmap-cgimap/pull/152 as well.

It would be very helpful to provide an option to deploy CGImap testing versions on the dev instance without risking to break production. This would also help to factor in early feedback from testers before reaching production.

tomhughes commented 6 years ago

The problem is that there's not easy way to do this with packaged cgimap - it would have been much easier when we were building from source.

The problem is that you can only have one version installed at a time, but dev apis are individually configurable so if we're going to use cgimap for them we should be able to configure which version of cgimap to use just as we can configure which version of rails to use.

mmd-osm commented 6 years ago

From https://github.com/zerebubuth/openstreetmap-cgimap/issues/140#issuecomment-394180952, I understood that cgimap would be enabled for all dev apis, but I wasn't aware of how multiple versions fit into this scheme. The question is, if we really want to run multiple cgimap versions on the dev system at all?

Basically we could deploy the test PPA, and in addition build some other cgimap version from source. Special care is needed to build those binaries without shared libary and including static support, otherwise we might be overwriting some of cgimap's shared libraries, and introduce a bit of a mess.

tomhughes commented 6 years ago

Well the whole point of the dev api system is to allow us to setup test instances which should really include cgimap I agree but equally the whole point is that each instance is configurable as to what versions it runs so that we can have multiple tests in progress.

tomhughes commented 6 years ago

That was really the point of my previous comment by the way, that I can easily enable it for the APIs, but that I don't think that is really "correct" but thinking about it more later made me realise that it was even harder than I thought to do it "properly".

Does dpkg support relocation at all? Either by default or optionally when building?

mmd-osm commented 6 years ago

I think the easiest way to support this scenario would be to disregard any PPA, and only use statically linked binaries. Manual step to create such a binary would be along those lines. Unfortunately, I'm not at all familiar with Chef and don't know how this would translate into something Chef understands.

-> The last step will install a statically linked cgimap binary as /srv/cgimap/test1/bin/openstreetmap-cgimap

Another cgimap version could then be compiled in a similar way, by changing the checkout command, as well as choosing another --prefix= installation directory.

Once the openstreetmap-cgimap binaries are available, they would have to be managed by some systemd script, similar to production, where each binary gets a different FastCGI port.

Then it's up to the Apache redirect config to decide which API gets directed to which FastCGI port.

tomhughes commented 6 years ago

There's no need for it to be statically linked - just building from source is good enough.

That's how we used to deploy cgimap in fact.

I'm just reluctant to deploy it in a completely different way on dev, as it means I can't share the deployment logic :-(

mmd-osm commented 6 years ago

There's no need for it to be statically linked - just building from source is good enough.

Right, a quick check also confirmed that the prefix'ed binary refers to their local shared libs:

libcgimap_fcgi.so.0 => /srv/cgimap/test1/lib/libcgimap_fcgi.so.0 (0x00007f248d0dd000)
libcgimap_apidb.so.0 => /srv/cgimap/test1/lib/libcgimap_apidb.so.0 (0x00007f248ce70000)
libcgimap_pgsnapshot.so.0 => /srv/cgimap/test1/lib/libcgimap_pgsnapshot.so.0 (0x00007f248cc52000)
libcgimap_staticxml.so.0 => /srv/cgimap/test1/lib/libcgimap_staticxml.so.0 (0x00007f248ca25000)
libcgimap_core.so.0 => /srv/cgimap/test1/lib/libcgimap_core.so.0 (0x00007f248c70f000)

I could imagine that we use different ways depending on the maturity of the software:

I think this might be a reasonable compromise, as building new PPAs is really a pain, and this should only be done, if there's chances the code is ready for production (there's some way to create some dev snapshot PPAs as well, but I haven't really figured out how they work).

I don't see any way to run multiple PPA versions in parallel, except for some Docker or maybe snap based approaches. They might be a bit of an overkill here.

mmd-osm commented 6 years ago

Current candidates for compile from source are:

Topic Link to Branch PRs / Issues Comments
JSON output JSON Branch 153, 155 ./configure --enable-yajl to enable JSON libs - Released as 0.6.2
Perf. Changeset Upload Bulk upload branch 152, 140 Apache rewrite rules need to include POST request for "^/api/0\.6/changeset/[[:digit:]]+/upload.*$"

For the changeset upload, you can specify a dedicated database(host), username, password, similar to OAuth. The prefix in this case is update-, e.g.

       "CGIMAP_UPDATE_USERNAME" => "api",
       "CGIMAP_UPDATE_PASSWORD" => "password"

For security reasons, I'd recommend to create a dedicated api write user with very limited authorizations:

GRANT UPDATE ON SEQUENCE public.current_nodes_id_seq TO demo;
GRANT UPDATE ON SEQUENCE public.current_ways_id_seq TO demo;
GRANT UPDATE ON SEQUENCE public.current_relations_id_seq TO demo;

GRANT SELECT,UPDATE ON TABLE public.changesets TO demo;

GRANT SELECT,INSERT,UPDATE ON TABLE public.current_nodes TO demo;
GRANT SELECT,INSERT,DELETE ON TABLE public.current_node_tags TO demo;

GRANT SELECT,INSERT,UPDATE ON TABLE public.current_ways TO demo;
GRANT SELECT,INSERT,DELETE ON TABLE public.current_way_nodes TO demo;
GRANT SELECT,INSERT,DELETE ON TABLE public.current_way_tags TO demo;

GRANT SELECT,INSERT,UPDATE ON TABLE public.current_relations TO demo;
GRANT SELECT,INSERT,DELETE ON TABLE public.current_relation_tags TO demo;
GRANT SELECT,INSERT,DELETE ON TABLE public.current_relation_members TO demo;

GRANT SELECT,INSERT ON TABLE public.nodes TO demo;
GRANT SELECT,INSERT ON TABLE public.node_tags TO demo;

GRANT SELECT,INSERT ON TABLE public.ways TO demo;
GRANT SELECT,INSERT ON TABLE public.way_nodes TO demo;
GRANT SELECT,INSERT ON TABLE public.way_tags TO demo;

GRANT SELECT,INSERT ON TABLE public.relations TO demo;
GRANT SELECT,INSERT ON TABLE public.relation_tags TO demo;
GRANT SELECT,INSERT ON TABLE public.relation_members TO demo;
mmd-osm commented 6 years ago

For the bulk upload branch, I'm proposing something along the following bits for the compilation and installation part (tested using kitchen). Commit revision refers to https://github.com/zerebubuth/openstreetmap-cgimap/tree/3137a5f03f7c54d35cb36e32cbeab08459379fe4

What other parts would be missing now? https://github.com/openstreetmap/chef/blob/master/cookbooks/web/recipes/cgimap.rb has some code to start up cgimap instances - somehow this needs to be combined?

include_recipe "git"

package %w[
        gcc
        make
        autoconf
        automake
        libtool
        libxml2-dev
        libpqxx-dev
        libfcgi-dev
        libboost-dev
        libboost-regex-dev
        libboost-program-options-dev
        libboost-date-time-dev
        libboost-filesystem-dev
        libboost-system-dev
        libmemcached-dev
        libboost-locale-dev
        libcrypto++-dev
        libyajl-dev
        zlib1g-dev
       ]

buildbase = "/opt/cgimap_build"
installbase = "/opt/cgimap"

directory "#{buildbase}/" do
  owner "root"
  group "root"
  mode 0o755
end

cgimap = {
      :bulk => {
        :branch => "bulk",
        :repository => "git://github.com/zerebubuth/openstreetmap-cgimap.git",
        :revision => "3137a5f03f7c54d35cb36e32cbeab08459379fe4",
        :configure_options => ""
      },
      :json => {
        :branch => "json",
        :repository => "git://github.com/zerebubuth/openstreetmap-cgimap.git",
        :revision => "v0.6.2",
        :configure_options => "--enable-yajl"
      }
    }

cgimap.each do |name, details|

  branch = details[:branch]
  repo = details[:repository]
  revision = details[:revision]
  configure_options = details[:configure_options]

  cgimap_directory = "#{buildbase}/#{branch}"

  directory cgimap_directory do
    owner "root"
    group "root"
    mode 0o755
  end

  execute "cgimap-build-#{branch}" do
    action :nothing
    command "make install"
    cwd cgimap_directory
    user "root"
    group "root"
  end

  execute "cgimap-configure-#{branch}" do
    action :nothing
    command "./configure #{configure_options} --program-suffix=-#{branch} --prefix=#{installbase}/#{branch}"
    cwd cgimap_directory
    user "root"
    group "root"
    notifies :run, "execute[cgimap-build-#{branch}]", :immediate
  end

  execute "cgimap-autogen-#{branch}" do
    action :nothing
    command "./autogen.sh"
    cwd cgimap_directory
    user "root"
    group "root"
    notifies :run, "execute[cgimap-configure-#{branch}]", :immediate
  end

  git cgimap_directory do
    action :sync
    repository "#{repo}"
    revision "#{revision}"
    user "root"
    group "root"
    notifies :run, "execute[cgimap-autogen-#{branch}]", :immediate
  end

end
tomhughes commented 6 years ago

There's no need to write new code - jut dig out the old code from the chef repository.

Building it is the easy bit anyway! The hard bit is all the tooling to allow configuration of which cgimap to use for each dev site and to start the cgimaps, having somehow chosen a unique port for each onem and then to configure the proxying from apache.

mmd-osm commented 6 years ago

There's no need to write new code - jut dig out the old code from the chef repository.

The latest version I found before "building from source was dropped" is in fact: https://github.com/openstreetmap/chef/blob/a68415b8f2bf106b6ea5948b0605c897b516ef4f/cookbooks/web/recipes/cgimap.rb

However, this version lacked a few dependencies (e.g. OAuth support was added later on, which requires crypto, JSON was missing). Then, there's a different git repo with a specific commit hash to be used for building, and of course support for multiple cgimap builds wasn’t there yet.

The hard bit is all the tooling to allow configuration of which cgimap to use for each dev site and to start the cgimaps

To define such a setup, https://github.com/openstreetmap/chef/blob/master/roles/dev.rb#L101-L125 could be extended to include

  1. CGImap branch definition for compilation

This section defines all different cgimap versions to be tested. Instructions are meant to download and checkout the respective commit from Git, compile it, and install it to a suitable location. There could be one entry for JSON output, another one for Bulk Upload, and maybe a third one for GDPR related changes.

See previous post for updated code example

  1. CGImap service runtime definition

This section is meant to define a new CGImap service. Based on this information, an Apache host can be automatically created, which acts as a reverse proxy both to the CGImap instance and the Rails port. Other settings are used to define the relevant CGImap binary, as well as the database instance to connect to from CGImap.

So based on the master branch we define:

I have to say that my understanding of how chef works is quite basic, so please bear with me ...

mmd-osm commented 6 years ago

Many thanks, this looks great!

I'd say we're almost ready for testing. One thing I noticed is the Apache rewrite rule in https://github.com/openstreetmap/chef/blob/45c5fe3c51b684fbca8cdab530879c4439438bbe/cookbooks/dev/templates/default/apache.rails.erb#L41-L47 for POST changeset uploads. Once that's in place, the new code will be active.

(from lighttpd.conf:)

$HTTP["request-method"] == "POST" {
  url.rewrite-once = (
    "^/api/0\.6/changeset/[[:digit:]]+/upload.*$" => "/dispatch.map",
  )
}
tomhughes commented 6 years ago

Yes I know - the rules were just copied from production for now and I need to figure out what I need to add.

tomhughes commented 6 years ago

Uploads should be enabled now.

mmd-osm commented 6 years ago

Retest looks good now! I uploaded a few changesets in JOSM with Basic Auth, OAuth, uncompressed and compressed upload: https://upload.apis.dev.openstreetmap.org/user/mmd2/history

mmd-osm commented 6 years ago

I have pushed a bug fix to the feature branch and wonder when and what happens next. Is there any manual activity needed, or will chef simply check the git repo from time to time, and deploy a new build on errol as needed?

tomhughes commented 6 years ago

It's tracking the branch so it should update I hope...

mmd-osm commented 6 years ago

Confirmed, the build on errol got auto-updated in the meantime. :+1:

mmd-osm commented 6 years ago

I rebased the feature/bulk_upload branch from the current master (i.e. version 0.6.2) yesterday, so that we could in theory also test JSON output on the dev instance.

It looks like all necessary dependencies are already in place. Adding --enable-yajl in https://github.com/openstreetmap/chef/blob/c4dcf4f618c5eef194e92651eabb95be314cdc02/cookbooks/dev/recipes/default.rb#L305 would be the only thing needed to enable JSON.

tomhughes commented 6 years ago

Are you saying you want me to update the dev builds to configure with that flag?

mmd-osm commented 6 years ago

yes, if you don't mind, that would be great.

tomhughes commented 6 years ago

Done.

mmd-osm commented 6 years ago

Thanks! Glad we did this, I just discovered a bug in the changeset download API call, where I didn't really expect it...

curl -H "Accept: text/json" https://upload.apis.dev.openstreetmap.org/api/0.6/changeset/826/download

Besides, I think there might be some issue with the Apache rewrite rules. I'd expect

https://upload.apis.dev.openstreetmap.org/api/0.6/node/345288.json

to get routed to cgimap land, but it ended up on the rails port.

tomhughes commented 6 years ago

Well yes because that won't match the rewrite - nothing with an extension will.

tomhughes commented 6 years ago

I have (hopefully) updated the rewrite rules now.

mmd-osm commented 6 years ago

Thanks, looks good here.

Originally, I didn't have osmChange and diffResult messages in scope for JSON output, similar to what is supported in Overpass API. As those message formats don't have any formal specification for their JSON counterpart, I'm not really sure, if that's something that needs to be added to cgimap.

In any case, the last rewrite rule ^/api/0\.6/changeset/[0-9]+/(upload|download)(\.json)?$ will return some unexpected results due to this. I wouldn't oppose dropping the .json suffix for the upload and download rule again.

mmd-osm commented 5 years ago

To get some more exposure and feedback, I asked a few mappers on the forum to give this implementation a try and see how it goes.

A user reported an HTTP 504 Gateway Timeout error today while uploading a large changeset (https://upload.apis.dev.openstreetmap.org/api/0.6/changeset/844, created_at="2018-11-18T17:02:51Z" closed_at="2018-11-18T17:03:42Z").

I'm still trying to find out what exactly happened here, based on the JOSM screenshot provided by the user. It would be very useful to see if there's some additional information available in the dev server's Apache log files or cgimap logs, maybe?

bildschirmfoto von 2018-11-18 20-09-53

tomhughes commented 5 years ago

I wouldn't expect production level performance and stability from the dev instances - it was probably exactly what it says it is.

In any case an error like that is unlikely to leave much trace in the logs.

tomhughes commented 5 years ago

Here's the apache access log:

a.b.c.d - - [18/Nov/2018:17:02:52 +0000] "POST /api/0.6/changeset/844/upload HTTP/1.1" 504 871 "-" "JOSM/1.5 (14382 de) Windows 7 64-Bit Java/10.0.1"
a.b.c.d - - [18/Nov/2018:17:03:34 +0000] "POST /api/0.6/changeset/844/upload HTTP/1.1" 409 711 "-" "JOSM/1.5 (14382 de) Windows 7 64-Bit Java/10.0.1"
a.b.c.d - - [18/Nov/2018:17:03:42 +0000] "PUT /api/0.6/changeset/844/close HTTP/1.1" 200 1274 "-" "JOSM/1.5 (14382 de) Windows 7 64-Bit Java/10.0.1"

and the error log:

[Sun Nov 18 17:03:24.114838 2018] [proxy_fcgi:error] [pid 429:tid 139737893431040] (70007)The timeout specified has expired: [client a.b.c.d:59974] AH01075: Error dispatching request to : (polling)

and the cgimap log:

[2018-11-18T17:02:54 #8559] Started request for changeset/upload 844 from a.b.c.d
[2018-11-18T17:03:26 #8559] Completed request for changeset/upload 844 from a.b.c.d in 34458 ms returning 340925 bytes
[2018-11-18T17:03:36 #8861] Started request for changeset/upload 844 from a.b.c.d
[2018-11-18T17:03:42 #8861] Returning with http error 409 with reason The changeset 844 was closed at 2018-11-18 17:03:34 UTC

So it looks like it did succeed but Apache timed out after 30s and returned the 504 error then two seconds later it finished! The second request then errored as the changeset was closed.

tomhughes commented 5 years ago

It seems this is governed by the global Timeout directive in the apache configuration, or by ProxyTimeout if that is defined.

mmd-osm commented 5 years ago

Thank you for the log file snippets! It looks like the default timeout is somewhere around 30s, and that very first changeset upload for 844 happened to be just a few seconds too slow.

The second upload attempt for cs 844 failed, because the changeset already contained 5972 changes by that time, and adding another 5972 wouldn't fit into the 10'000 changes limit anymore - that's why the second request was rejected with an HTTP 409 response.

JOSM decided that it should upload the same data again, now in another changeset 845 (https://upload.apis.dev.openstreetmap.org/api/0.6/changeset/845). This time, it took only 12s to process. So it looks like the database was a bit slow to process the first request at 18/Nov/2018:17:02:52, which seems ok for a development system.

tomhughes commented 5 years ago

The production servers already had the timeout increased to an hour so I have applied the same to the dev instances now in https://github.com/openstreetmap/chef/commit/0d6c835903afa9e803a7f9e221d1df1551166516.

mmd-osm commented 5 years ago

I'm closing this issue for the time being, as the original request to deploy cgimap on the dev instance has been completed.

mmd-osm commented 5 years ago

For testing purposes, I have built a new PPA under https://launchpad.net/~mmd-osm/+archive/ubuntu/cgimap-700 as 0.7.0-RC1. It includes the latest changes in the feature/bulk_upload branch. JSON is disabled by default again. I will create another build and operations issue for the final 0.7.0, once the number of volunteers for code reviews reaches a non zero number.

mmd-osm commented 5 years ago

I'm trying to track down a performance regression on https://upload.apis.dev.openstreetmap.org/changeset/1054 which I'm unable to reproduce locally right now.

To further isolate the statement(s) in question, I added a bit of instrumentation code to measure query runtimes and log statistics to the cgimap log file. The respective log entries should be similar to:

Executed prepared statement xyz in 0 ms, returning 1 rows, 1 affected rows

Would it be possible to provide those log entries for changeset 1054?

Thanks a lot! `

tomhughes commented 5 years ago

Here you go:

[2019-02-26T17:17:55 #24709] Started request for changeset/upload 1054 from 93.225.121.221
[2019-02-26T17:17:55 #24709] Executed prepared statement lock_current_relations in 1 ms, returning 1 rows, 1 affected rows
[2019-02-26T17:17:56 #24709] Executed prepared statement lock_future_nodes_in_relations in 743 ms, returning 30291 rows, 30291 affected rows
[2019-02-26T17:17:56 #24709] Executed prepared statement check_current_relation_versions in 1 ms, returning 0 rows, 0 affected rows
[2019-02-26T17:17:56 #24709] Executed prepared statement relations_with_new_relation_members in 2 ms, returning 0 rows, 0 affected rows
[2019-02-26T17:17:56 #24709] Executed prepared statement relations_with_changed_relation_tags in 158 ms, returning 1 rows, 1 affected rows
[2019-02-26T17:17:56 #24709] Executed prepared statement calc_relation_bbox_nodes in 146 ms, returning 1 rows, 1 affected rows
[2019-02-26T17:17:56 #24709] Executed prepared statement calc_relation_bbox_ways in 246 ms, returning 1 rows, 1 affected rows
[2019-02-26T17:27:58 #24709] Executed prepared statement relations_with_changed_way_node_members in 601973 ms, returning 0 rows, 0 affected rows
[2019-02-26T17:27:58 #24709] Executed prepared statement delete_current_relation_tags in 5 ms, returning 0 rows, 2 affected rows
[2019-02-26T17:27:59 #24709] Executed prepared statement delete_current_relation_members in 701 ms, returning 0 rows, 30291 affected rows
[2019-02-26T17:27:59 #24709] Executed prepared statement update_current_relations in 37 ms, returning 1 rows, 1 affected rows
[2019-02-26T17:27:59 #24709] Executed prepared statement insert_new_current_relation_tags in 29 ms, returning 0 rows, 2 affected rows
[2019-02-26T17:28:15 #24709] Executed prepared statement insert_new_current_relation_members in 16243 ms, returning 0 rows, 30291 affected rows
[2019-02-26T17:28:15 #24709] Executed prepared statement current_relations_to_history in 165 ms, returning 0 rows, 1 affected rows
[2019-02-26T17:28:15 #24709] Executed prepared statement current_relation_tags_to_history in 60 ms, returning 0 rows, 2 affected rows
[2019-02-26T17:28:17 #24709] Executed prepared statement current_relation_members_to_history in 2087 ms, returning 0 rows, 30291 affected rows
[2019-02-26T17:28:18 #24709] Executed prepared statement calc_relation_bbox_nodes in 136 ms, returning 1 rows, 1 affected rows
[2019-02-26T17:28:18 #24709] Executed prepared statement calc_relation_bbox_ways in 1 ms, returning 1 rows, 1 affected rows
[2019-02-26T17:28:18 #24709] Completed request for changeset/upload 1054 from 93.225.121.221 in 624189 ms returning 350 bytes
tomhughes commented 5 years ago

Looks like this is the main issue:

[2019-02-26T17:27:58 #24709] Executed prepared statement relations_with_changed_way_node_members in 601973 ms, returning 0 rows, 0 affected rows
mmd-osm commented 5 years ago

Looks good. This seems to confirm my assumption that this is related in some way to the 'bbox calculation'. Here's the statement in question:

            WITH tmp_member(relation_id, member_type, member_id) AS (
                 SELECT * FROM
                 UNNEST( CAST($1 as bigint[]),
                         CAST($2 as nwr_enum[]),
                         CAST($3 as bigint[])
                 )
              )
            /* new member was added in tmp */
            SELECT tm.member_type, tm.member_id, true as new_member
                   FROM   tmp_member tm
                   LEFT OUTER JOIN current_relation_members cm
                     ON tm.relation_id = cm.relation_id
                    AND tm.member_type = cm.member_type
                    AND tm.member_id   = cm.member_id
                 WHERE cm.member_id IS NULL
            UNION 
            /* existing member was removed in tmp */
            SELECT cm.member_type, cm.member_id, false as new_member
               FROM current_relation_members cm
               INNER JOIN tmp_member t1
                  ON cm.relation_id = t1.relation_id
               LEFT OUTER JOIN tmp_member tm
                  ON cm.relation_id = tm.relation_id
                 AND cm.member_type = tm.member_type
                 AND cm.member_id   = tm.member_id
            WHERE tm.member_id IS NULL

I need to take a closer look now how this could be improved.

tomhughes commented 5 years ago

I think the problem is that inner join in the second branch of the union because it causes a cross product explosion with a large relation - it will also cause you to get duplicate results if you give it input of ten members it will give you each result for each one.

So with the 30000 member relation 5107 that is in that changeset it probably returned about 900 million rows or something silly.

Better to drop the inner join and do a IN subquery in the where clause I think:

...AND cm.relation_id IN (SELECT relation_id FROM tmp_member)
mmd-osm commented 5 years ago

Well, the purpose of that statement was to get a list of nodes and ways per relation, which have been either added or removed. Those candidates need to be included in the changeset bounding box.

To keep things simple, I've broken up the original statement into two smaller ones (relations_with_added_way_node_members and relations_with_removed_way_node_members), and replaced the INNER JOIN in the second part of the query like you recommended (--> https://github.com/mmd-osm/openstreetmap-cgimap/commit/1f312ebef45af3c938eeb482f0b9491d6a6c7bce). It would be interesting to compare an EXPLAIN ANALYZE for both versions.

I'm a bit surprised that the query time is significantly faster now. Did you change anything else on the db, like running a VACUUM on those tables, maybe?

tomhughes commented 5 years ago

The point is that previously it was returning each deleted member more than once.

If you fed in a single relation of 100 members and the existing relation has 100 members 50 of which were the same you would get 50 add results and 5000 delete results because it would select all 100 existing members for each new member given 10000 rows from the inner join and then outer join each one giving 5000 delete results as each of the deleted rows would appear 100 times in the inner join.

tomhughes commented 5 years ago

Oh and I would have explained it but that is hard because of the parameters...

tomhughes commented 5 years ago

I've tried to do a plan based on going back from the current version 10 to version 9 of that relation with your old code:

                                                                              QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=230674.38..244418.09 rows=1374371 width=12)
   CTE tmp_member
     ->  Bitmap Heap Scan on relation_members  (cost=712.16..7031.66 rows=13633 width=20)
           Recheck Cond: ((relation_id = 5107) AND (version = 9))
           ->  Bitmap Index Scan on relation_members_pkey  (cost=0.00..708.75 rows=13633 width=0)
                 Index Cond: ((relation_id = 5107) AND (version = 9))
   ->  Sort  (cost=223642.72..227078.65 rows=1374371 width=12)
         Sort Key: tm.member_type, tm.member_id, (true)
         ->  Append  (cost=7304.19..60034.76 rows=1374371 width=12)
               ->  Hash Anti Join  (cost=7304.19..8135.14 rows=11929 width=12)
                     Hash Cond: ((tm.relation_id = cm.relation_id) AND (tm.member_type = cm.member_type) AND (tm.member_id = cm.member_id))
                     ->  CTE Scan on tmp_member tm  (cost=0.00..272.66 rows=13633 width=20)
                     ->  Hash  (cost=3882.34..3882.34 rows=195534 width=20)
                           ->  Seq Scan on current_relation_members cm  (cost=0.00..3882.34 rows=195534 width=20)
               ->  Merge Join  (cost=2418.21..38155.91 rows=1362442 width=12)
                     Merge Cond: (t1.relation_id = cm_1.relation_id)
                     ->  Sort  (cost=1208.89..1242.98 rows=13633 width=8)
                           Sort Key: t1.relation_id
                           ->  CTE Scan on tmp_member t1  (cost=0.00..272.66 rows=13633 width=8)
                     ->  Materialize  (cost=1209.31..16869.99 rows=171092 width=20)
                           ->  Merge Anti Join  (cost=1209.31..16442.26 rows=171092 width=20)
                                 Merge Cond: ((cm_1.relation_id = tm_1.relation_id) AND (cm_1.member_type = tm_1.member_type) AND (cm_1.member_id = tm_1.member_id))
                                 ->  Index Only Scan using current_relation_members_pkey on current_relation_members cm_1  (cost=0.42..13630.52 rows=195534 width=20)
                                 ->  Sort  (cost=1208.89..1242.98 rows=13633 width=20)
                                       Sort Key: tm_1.relation_id, tm_1.member_type, tm_1.member_id
                                       ->  CTE Scan on tmp_member tm_1  (cost=0.00..272.66 rows=13633 width=20)
(26 rows)

and then with my subquery:

                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=31005.95..31980.70 rows=97475 width=12)
   Group Key: tm.member_type, tm.member_id, (true)
   CTE tmp_member
     ->  Bitmap Heap Scan on relation_members  (cost=712.16..7031.66 rows=13633 width=20)
           Recheck Cond: ((relation_id = 5107) AND (version = 9))
           ->  Bitmap Index Scan on relation_members_pkey  (cost=0.00..708.75 rows=13633 width=0)
                 Index Cond: ((relation_id = 5107) AND (version = 9))
   ->  Append  (cost=7304.19..23243.23 rows=97475 width=12)
         ->  Hash Anti Join  (cost=7304.19..8135.14 rows=11929 width=12)
               Hash Cond: ((tm.relation_id = cm.relation_id) AND (tm.member_type = cm.member_type) AND (tm.member_id = cm.member_id))
               ->  CTE Scan on tmp_member tm  (cost=0.00..272.66 rows=13633 width=20)
               ->  Hash  (cost=3882.34..3882.34 rows=195534 width=20)
                     ->  Seq Scan on current_relation_members cm  (cost=0.00..3882.34 rows=195534 width=20)
         ->  Merge Anti Join  (cost=13019.34..14133.35 rows=85546 width=12)
               Merge Cond: ((cm_1.relation_id = tm_1.relation_id) AND (cm_1.member_type = tm_1.member_type) AND (cm_1.member_id = tm_1.member_id))
               ->  Sort  (cost=11810.44..12054.86 rows=97767 width=20)
                     Sort Key: cm_1.relation_id, cm_1.member_type, cm_1.member_id
                     ->  Nested Loop  (cost=307.16..3707.00 rows=97767 width=20)
                           ->  HashAggregate  (cost=306.74..308.74 rows=200 width=8)
                                 Group Key: tmp_member.relation_id
                                 ->  CTE Scan on tmp_member  (cost=0.00..272.66 rows=13633 width=8)
                           ->  Index Only Scan using current_relation_members_pkey on current_relation_members cm_1  (cost=0.42..15.85 rows=114 width=20)
                                 Index Cond: (relation_id = tmp_member.relation_id)
               ->  Sort  (cost=1208.89..1242.98 rows=13633 width=20)
                     Sort Key: tm_1.relation_id, tm_1.member_type, tm_1.member_id
                     ->  CTE Scan on tmp_member tm_1  (cost=0.00..272.66 rows=13633 width=20)
(26 rows)
tomhughes commented 5 years ago

Not sure how well that worked but you can see the huge difference in the estimated rows at least.

mmd-osm commented 5 years ago

The point is that previously it was returning each deleted member more than once.

That was a really good catch, I really missed that the first time! From a functional point of view, it didn't make a difference in bbox calculation, performance wise it was quite a nightmare which became all too obvious with those huge relations.

mmd-osm commented 5 years ago

In moving forward it would be good to have some independent code review, which sounds way easier than it is. I've tried to reach out to zerebubuth by mail a couple of weeks ago, but got no reply. As he has "unwatched" his own repo some time ago, I'm not sure if he still gets any of those "cgimap repo" notifications. That's quite a disappointment, if you ask me.

mmd-osm commented 5 years ago

I have a quick question. Could we enable cgimap on master.apis.dev.openstreetmap.org as well without disrupting the existing rails database? As many apps already use this instance for their testing purposes, this could be useful to identify unknown issues before they hit us in production.

upload.apis.dev.openstreetmap.org has been really great so far, but it often times required people to do a conscious decision to use that instance. I believe this to be a bit of a limiting factor in the past in achieving even more testing exposure.

Ideally :cgimap_revision should point master for master.apis.dev.openstreetmap.org.

tomhughes commented 5 years ago

Done.

mmd-osm commented 5 years ago

Since no new issues were reported in the meantime, I have bumped cgimap version to 0.7.0 today and prepared a new PPA: https://launchpad.net/~mmd-osm/+archive/ubuntu/cgimap/+packages

Would it be helpful to track production related activities in a separate issue now?