openownership / register

A demonstration transnational register of beneficial ownership data from the UK, Denmark, Slovakia and Armenia
https://register.openownership.org
GNU Affero General Public License v3.0
18 stars 3 forks source link

PSC-STM-A9: Switch over to live ingestion #248

Open tiredpixel opened 6 months ago

tiredpixel commented 6 months ago

To switch over, should happen over a 2 day period:

Day 1

Day 2

Estimate: 4 hours

tiredpixel commented 4 months ago

Day 1 snapshot ingestion has started.

tiredpixel commented 4 months ago

Day 1 snapshot ingestion has completed. There were various invalid rows reported, but I'm guessing that's unrelated since it was the same process as is run for the monthly bulk data.

tiredpixel commented 4 months ago

Day 1 PSC stream ingestion has been started:

2024-04-23T13:13:03.010318+00:00 app[worker.1]: I, [2024-04-23T13:13:03.010203 #2]  INFO -- : [11098166] /company/15673183/persons-with-significant-control/individual/a8AKQH1QivKUiOAK2fafp-Adgvo
2024-04-23T13:13:03.255341+00:00 app[worker.1]: I, [2024-04-23T13:13:03.255221 #2]  INFO -- : [11098168] /company/15673184/persons-with-significant-control/individual/hSONNRBkkwHLpuU3giug0nbRb6A
2024-04-23T13:13:03.426444+00:00 app[worker.1]: I, [2024-04-23T13:13:03.426299 #2]  INFO -- : [11098170] /company/15673183/persons-with-significant-control/individual/9KjPNcTZHpzdF126Vf9wC1EhLvw
2024-04-23T13:13:03.619952+00:00 app[worker.1]: I, [2024-04-23T13:13:03.619812 #2]  INFO -- : [11098172] /company/13980953/persons-with-significant-control/individual/IWmC684L52m1tOB75xsJTaBVjKA
2024-04-23T13:13:03.786374+00:00 app[worker.1]: I, [2024-04-23T13:13:03.786235 #2]  INFO -- : [11098174] /company/13980953/persons-with-significant-control/individual/-nhZVXLktbaFEiWtPSz-bje63oA
2024-04-23T13:13:04.227009+00:00 app[worker.1]: I, [2024-04-23T13:13:04.226854 #2]  INFO -- : [11098176] /company/14385726/persons-with-significant-control/individual/iwizinQ2RomCKudlusV7cqghChw
2024-04-23T13:13:04.486477+00:00 app[worker.1]: I, [2024-04-23T13:13:04.486362 #2]  INFO -- : [11098178] /company/04035718/persons-with-significant-control/individual/2cDUpUI0FOjwmHv13C-gWRJBGms
2024-04-23T13:13:04.534287+00:00 app[worker.1]: I, [2024-04-23T13:13:04.534171 #2]  INFO -- : [11098180] /company/13994001/persons-with-significant-control/individual/QjB3qLjLGw94ITVPLqTMXgKqxdg
2024-04-23T13:13:04.567750+00:00 app[worker.1]: I, [2024-04-23T13:13:04.567639 #2]  INFO -- : [11098182] /company/00964058/persons-with-significant-control/corporate-entity/Ui4_zPKXhTD6udbH39QifycPykY
2024-04-23T13:13:04.599554+00:00 app[worker.1]: I, [2024-04-23T13:13:04.599425 #2]  INFO -- : [11098184] /company/14938674/persons-with-significant-control/individual/TRax4LHtgT3_OcdOKscrNq10vl8

There were a few crashes, at first, with HTTP 504 returned in the JSON body causing a failure. That seems to have been a transient error—but we should keep an eye on it to see if it keeps occurring. Airbrake will report it, if so.

It was necessary to set stdout sync in order for Heroku to report the correct status: https://github.com/openownership/register-ingester-psc/pull/35

tiredpixel commented 4 months ago

Day 2 snapshot ingestion ingestion cannot be run until tomorrow (despite being actually Day 3 in this case), since it's necessary for today's data to appear in the PSC bulk snapshot: https://download.companieshouse.gov.uk/en_pscdata.html

tiredpixel commented 4 months ago

Airbrake fired overnight, notifying of a crash. Investigation showed that something went wrong, seemingly within Heroku, and the app eventually got relocated to another server:

Apr 24 01:51:03 register-ingester-psc-prd app/worker.1 I, [2024-04-24T01:51:03.214680 #2]  INFO -- : [11103517] /company/15674555/persons-with-significant-control/individual/LNzFxJEtl4i7ynxUgqdAeIL2XkY
Apr 24 02:17:02 register-ingester-psc-prd heroku/worker.1 Process exited with status 1
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1 /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/protocol.rb:227:in `rbuf_fill': end of file reached (Faraday::ConnectionFailed)
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/protocol.rb:193:in `readuntil'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/protocol.rb:203:in `readline'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http/response.rb:322:in `read_chunked'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http/response.rb:292:in `block in read_body_0'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http/response.rb:285:in `inflater'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http/response.rb:290:in `read_body_0'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http/response.rb:210:in `read_body'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http.rb:1289:in `block in get'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http.rb:1584:in `block in transport_request'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http/response.rb:167:in `reading_body'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http.rb:1583:in `transport_request'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http.rb:1539:in `request'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http.rb:1288:in `get'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:139:in `block in request_via_get_method'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http.rb:966:in `start'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:138:in `request_via_get_method'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:129:in `request_with_wrapped_block'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:109:in `perform_request'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:66:in `block in call'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-1.10.3/lib/faraday/adapter.rb:50:in `connection'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:64:in `call'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-1.10.3/lib/faraday/request/url_encoded.rb:25:in `call'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-1.10.3/lib/faraday/rack_builder.rb:154:in `build_response'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-1.10.3/lib/faraday/connection.rb:516:in `run_request'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-1.10.3/lib/faraday/connection.rb:202:in `get'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/bundler/gems/register-common-5a8b7f75783e/lib/register_common/adapters/http_adapter.rb:21:in `get'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/lib/register_ingester_psc/streams/clients/psc_stream.rb:25:in `read_stream'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/lib/register_ingester_psc/streams/apps/stream_ingester.rb:34:in `call'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/lib/register_ingester_psc/streams/apps/stream_ingester.rb:20:in `bash_call'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from bin/ingest-stream:17:in `<main>'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1 /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/protocol.rb:227:in `rbuf_fill': end of file reached (EOFError)
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/protocol.rb:193:in `readuntil'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/protocol.rb:203:in `readline'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http/response.rb:322:in `read_chunked'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http/response.rb:292:in `block in read_body_0'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http/response.rb:285:in `inflater'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http/response.rb:290:in `read_body_0'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http/response.rb:210:in `read_body'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http.rb:1289:in `block in get'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http.rb:1584:in `block in transport_request'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http/response.rb:167:in `reading_body'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http.rb:1583:in `transport_request'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http.rb:1539:in `request'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http.rb:1288:in `get'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:139:in `block in request_via_get_method'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/ruby-3.1.2/lib/ruby/3.1.0/net/http.rb:966:in `start'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:138:in `request_via_get_method'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:129:in `request_with_wrapped_block'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:109:in `perform_request'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:66:in `block in call'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-1.10.3/lib/faraday/adapter.rb:50:in `connection'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:64:in `call'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-1.10.3/lib/faraday/request/url_encoded.rb:25:in `call'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-1.10.3/lib/faraday/rack_builder.rb:154:in `build_response'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-1.10.3/lib/faraday/connection.rb:516:in `run_request'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/gems/faraday-1.10.3/lib/faraday/connection.rb:202:in `get'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/vendor/bundle/ruby/3.1.0/bundler/gems/register-common-5a8b7f75783e/lib/register_common/adapters/http_adapter.rb:21:in `get'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/lib/register_ingester_psc/streams/clients/psc_stream.rb:25:in `read_stream'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/lib/register_ingester_psc/streams/apps/stream_ingester.rb:34:in `call'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from /app/lib/register_ingester_psc/streams/apps/stream_ingester.rb:20:in `bash_call'
Apr 24 02:17:02 register-ingester-psc-prd app/worker.1  from bin/ingest-stream:17:in `<main>'
Apr 24 02:17:02 register-ingester-psc-prd heroku/worker.1 State changed from up to crashed
Apr 24 02:17:02 register-ingester-psc-prd heroku/worker.1 State changed from crashed to starting
Apr 24 02:17:05 register-ingester-psc-prd heroku/worker.1 Starting process with command `bin/ingest-stream`
Apr 24 02:17:06 register-ingester-psc-prd heroku/worker.1 State changed from starting to up
Apr 24 02:17:06 register-ingester-psc-prd app/worker.1 fatal: not a git repository (or any parent up to mount point /)
Apr 24 02:17:06 register-ingester-psc-prd app/worker.1 Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Apr 24 02:17:09 register-ingester-psc-prd app/worker.1 I, [2024-04-24T02:17:09.370241 #2]  INFO -- : [11103517] /company/15674555/persons-with-significant-control/individual/LNzFxJEtl4i7ynxUgqdAeIL2XkY
Apr 24 02:17:09 register-ingester-psc-prd app/worker.1 I, [2024-04-24T02:17:09.495638 #2]  INFO -- : [11103518] /company/15674554/persons-with-significant-control/individual/GDoxu-dfcDGV7RcytuZT_aeF8uI
Apr 24 02:17:09 register-ingester-psc-prd app/worker.1 I, [2024-04-24T02:17:09.596054 #2]  INFO -- : [11103519] /company/15674556/persons-with-significant-control/individual/cPNU1B2h7rneVxtgBFfCCB86zp0
Apr 24 02:23:03 register-ingester-psc-prd app/worker.1 I, [2024-04-24T02:23:03.305681 #2]  INFO -- : [11103524] /company/12291717/persons-with-significant-control/individual/3NnW9zmLMgZ7XNSJgqZzt9MP4aE
Apr 24 02:26:03 register-ingester-psc-prd app/worker.1 I, [2024-04-24T02:26:03.198415 #2]  INFO -- : [11103526] /company/15555700/persons-with-significant-control/individual/gdNQgSN_xwRu953x7nTeUdx9GfU
Apr 24 02:29:04 register-ingester-psc-prd app/worker.1 I, [2024-04-24T02:29:03.573396 #2]  INFO -- : [11103528] /company/15674561/persons-with-significant-control/individual/1z0_X7K8YYFYAZNnDybM8_qbkUM
Apr 24 02:36:03 register-ingester-psc-prd app/worker.1 I, [2024-04-24T02:36:02.619184 #2]  INFO -- : [11103533] /company/15592817/persons-with-significant-control/individual/zvPWBckSAz95uxjqSEHKaBjSLu4
Apr 24 02:43:03 register-ingester-psc-prd app/worker.1 I, [2024-04-24T02:43:02.589186 #2]  INFO -- : [11103537] /company/SC749094/persons-with-significant-control/individual/agFKm93589R_VmO6atC1s9DyY6c
Apr 24 02:43:03 register-ingester-psc-prd app/worker.1 I, [2024-04-24T02:43:02.766413 #2]  INFO -- : [11103539] /company/SC749094/persons-with-significant-control/individual/agFKm93589R_VmO6atC1s9DyY6c
Apr 24 03:00:04 register-ingester-psc-prd app/worker.1 I, [2024-04-24T03:00:03.863123 #2]  INFO -- : [11103548] /company/15674570/persons-with-significant-control/individual/mEpu-fw_bYvWSKNtv9N5Cv55YfM
Apr 24 04:34:05 register-ingester-psc-prd heroku/worker.1 Relocating dyno to a new server
Apr 24 04:34:05 register-ingester-psc-prd heroku/worker.1 State changed from up to down
Apr 24 04:34:05 register-ingester-psc-prd heroku/worker.1 State changed from down to starting
Apr 24 04:34:08 register-ingester-psc-prd heroku/worker.1 Starting process with command `bin/ingest-stream`
Apr 24 04:34:09 register-ingester-psc-prd heroku/worker.1 State changed from starting to up
Apr 24 04:34:10 register-ingester-psc-prd app/worker.1 fatal: not a git repository (or any parent up to mount point /)
Apr 24 04:34:10 register-ingester-psc-prd app/worker.1 Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Apr 24 04:34:20 register-ingester-psc-prd app/worker.1 I, [2024-04-24T04:34:19.668175 #2]  INFO -- : [11103548] /company/15674570/persons-with-significant-control/individual/mEpu-fw_bYvWSKNtv9N5Cv55YfM

From that excerpt, it seems that everything is working as it ought with regards to failure detection and restarts: note that timepoint 11103517 got run again after the first failure, and that timepoint 11103548 got processed again after the server relocation.

Ideally, a single timepoint would be reprocessed only if it failed midway, not if processing completed. However, the Companies House Streaming API documentation says:

Every event received from the stream is accompanied by a corresponding timepoint. When clients reconnect to the stream, they should supply the timepoint of the last sucessfully processed event to ensure continuity of data.

https://developer-specs.company-information.service.gov.uk/streaming-api/guides/overview

And yet, it seems that the API returns the first event with that timepoint, not the first event after that timepoint.

I don't think there's really anything we can do about this without diverging from the API documentation, and I'm not certain whether multiple events can ever have the same timepoint (it doesn't appear so, but I can't find anything definitive), so we'll just keep this as-is and reprocess the occasional event.

tiredpixel commented 4 months ago

Day 2 snapshot ingestion has started (although it is the 3rd day).

tiredpixel commented 4 months ago

Day 2 snapshot ingestion has completed successfully.