Open dhutchin-ctl opened 12 months ago
NTT is seeing the same thing on 4.2.8
And for the record we have had to irrd_mirror_force_reload RADB
at least 3 times since their cutover.
I have had a total of 5 instances where we got into this mode since November of 2022. Below is the fist error message in each of the instances. Maybe these match up with Troy2914's instances:
2022-11-02 01:50:33,793 irrd[1845133]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for LACNIC: %% ERROR: Serials 103009 - 110626 do not exist 2023-11-13 18:02:33,554 irrd[3741814]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: %% ERROR: Serials 50 - 22293297 do not exist 2023-11-17 22:17:22,884 irrd[2275035]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: %% ERROR: Serials 47968 - 57319 do not exist 2023-11-25 03:48:12,089 irrd[1533810]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: %% ERROR: Serials 115384 - 123994 do not exist 2023-12-05 03:10:27,547 irrd[1687948]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: %% ERROR: Serials 255032 - 268299 do not exist
In my log file, I only see "Received NRTM response for ... Serials .... do not exist" occasionally. For instance, the RADB lines above. However, for queries to the LEVEL3 IRR, I see "encountered parsing error while parsing query "-g LEVEL3.... Serials .... do not exist" all the time. In all the cases where the query is TO the LEVEL3 IRR, the query range is
This is an interesting one. Given that you both experience issues, and some people raised issues about the IRRexplorer instance, a best guess is that RADB's NRTM streams are the common part here. I'll see if my own logs show anything on this. Also wondering if @schelcj has ideas, or whether the timings align with something? I do not know RADB's config, or whether they do anything unusual.
Let me run through how this is implemented so we all know precisely what messages mean and for my own overview. Some of these choices may not be the clearest, but were made for backwards compatibility.
On the client side, IRRD keeps a serial_newest_mirror which only relates to the mirror source. When running NRTM, it requests (serial_newest_mirror+1)-LAST
, and logs the response. Received NRTM messages are processed by the NRTMStreamParser, which also updates serial_newest_seen to be the highest serial from the NRTM ADD/DEL operations in the response. This means that we expect NRTMv3 clients to request non-existing serials routinely, which are one beyond the actual range that the server has.
On the server side, responses are built by the NRTMGenerator, which takes a serial_start_requested
and optional serial_end_requested.
From the RPSLDatabaseStatus, we query serial_oldest_journal
and serial_newest_journal
to determine serial_start_available
and serial_end_available
. In our case, serial_end_requested would always be None, i.e. LAST. Then there's a few tests and responses that may apply:
serial_start_requested < serial_start_available
errors: Serials {serial_start_requested} - {serial_start_available} do not exist
- you are querying older serials than are known.
serial_end_availableserial_start_requested == serial_end_available + =
warns: there are no newer updates available
- you are querying just beyond the range.serial_start_requested > serial_end_available
errors: Serials {serial_end_available} - {serial_start_requested} do not exist
- the serials you are querying are way newer than are known.2023-12-04 13:39:13,176 irrd[1188550]: [irrd.mirroring.mirror_runners_import#INFO] Retrieving NRTM updates for RADB from serial 255032 on nrtm.radb.net:43 2023-12-04 13:39:13,176 irrd[1188550]: [irrd.utils.whois_client#DEBUG] Running whois query -g RADB:3:255032-LAST on nrtm.radb.net port 43 2023-12-04 13:39:13,300 irrd[1188550]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: % Warning: there are no newer updates available
This is normal behaviour that suggests you are up to date.
2023-12-04 13:54:14,507 irrd[1189565]: [irrd.mirroring.mirror_runners_import#INFO] Retrieving NRTM updates for RADB from serial 255032 on nrtm.radb.net:43 2023-12-04 13:54:14,508 irrd[1189565]: [irrd.utils.whois_client#DEBUG] Running whois query -g RADB:3:255032-LAST on nrtm.radb.net port 43 2023-12-04 13:54:14,622 irrd[1189565]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: %% ERROR: Serials 255032 - 268299 do not exist
This request and response are scenario 1: the remote journal starts at 268299, but you're asking for 255032. You are too far behind, it doesn't have the entries older than 268299. This is only recoverable with a full reload.
There are a few possible causes for this situation: your IRRD has been down for a while, the NRTMv3 server keeps only very short journals so the window is very narrow, or there has been a persistent issue before this which was preventing you from keeping up with NRTM. You'll have to dig more into your logs to figure out which, or perhaps RADB can tell us more about anything unusual on their end.
It is not possible to know what was happening there, as you need to know both the query and the response to figure out whether this is scenario 1 or 3.
As one of the operators of IRRExplorer (which uses irrd as one of its backends) I can confirm that we've seen similar problems over the past months as well. We're currently running irrd 4.4.2.
Notice Log Sample 2 is exactly 15 minutes after Log Sample 1. There were no RADB updates on the client side between those two samples. So the first serial to request on the client side should not change. I don’t know where “268299” comes from.
I will make myself available for you when ever you want to see more directly into the LEVEL3 system. I know, that doesn’t give you access to the RADB system at the same time. And of course, we don’t know how to trigger the error. I’m writing a script to more closely monitor the log so I will know shortly after this condition pops its head again.
D
We've seen the same internally and have been investigating as well. We saw the same serial constraint violation for the same serial as @dhutchin-ctl. Our nrtm.radb.net is fed via nrtm from the authoritative host and when this error hit any futher updates were blocked. A reseed resolved the issue but fairly certain we saw this once before after the v4 migration. We are also adding more monitoring and still investigating.
Notice Log Sample 2 is exactly 15 minutes after Log Sample 1. There were no RADB updates on the client side between those two samples. So the first serial to request on the client side should not change.
Ah right, that gives a bit more context. Using my earlier comment on NRTM responses and their precise meaning, we can narrow down.
We can assume the client keeps requesting 255032-LAST. At first, RADB replies:
2023-12-04 13:39:13,300 irrd[1188550]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: % Warning: there are no newer updates available
This is expected in a correctly running setup where you are up to date. This message means RADB has journal entries up to and including 255031, and you have those too in the mirror. All good.
15 minutes later, RADB replies:
2023-12-04 13:54:14,622 irrd[1189565]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: %% ERROR: Serials 255032 - 268299 do not exist
(The assumption that this is the same 255032-LAST query is consistent with the fact that the response mentions this serial. So we can be quite sure the same query was used both times.)
RADB now reports that its first serial is actually 268299 in the journal. That means it jumped 13267 serials in 15 minutes. 13K changes in 15 minutes is rare but not impossible, but what is strange here is that the journal did not just grow with at least 13K entries, the oldest 13K were also removed from the journal. It is impossible for clients to keep up in that case, or recover from that situation.
There are two scenarios I can think of:
if I am reading your two scenarios properly, those are both things to address on the RADB side?
For the serials jumping we have a maintainer whose automated processes may have gone sideways and producing a lot of unnecessary updates. We are reaching out to them to address.
For the serials jumping we have a maintainer whose automated processes may have gone sideways and producing a lot of unnecessary updates. We are reaching out to them to address.
that sounds like a 4.4.3 request :) to add a limit/throttle.
Back to the original issue; reviewing logs this morning shows that our nrtm.radb.net host retrieved, and applied, updates up to and shortly after 225031 then for unknown reasons tried to reapply that serial and ran into the unique key constraint preventing all further updates. We only had logging at INFO level so there isn't much more to go on at this time but have enabled DEBUG logging to review if this happens again.
2023-12-02 21:47:20,121 irrd[253050]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/255031/ADD/route6/2605:9CC0:397::/48AS8987
2023-12-02 21:47:20,122 irrd[253050]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/255032/ADD/route6/2605:9CC0:399::/48AS8987
2023-12-02 21:47:20,123 irrd[253050]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/255033/ADD/route/130.137.120.0/24AS8987
2023-12-02 21:47:20,135 irrd[253050]: [irrd.mirroring.mirror_runners_import#ERROR] An exception occurred while attempting a mirror update or initial import for RADB: (psycopg2.errors.Unique
DETAIL: Key (serial_nrtm, source)=(255031, RADB) already exists.
Back to the original issue; reviewing logs this morning shows that our nrtm.radb.net host retrieved, and applied, updates up to and shortly after 225031 then for unknown reasons tried to reapply that serial and ran into the unique key constraint preventing all further updates. We only had logging at INFO level so there isn't much more to go on at this time but have enabled DEBUG logging to review if this happens again.
Ok, we can see what those logs produce, but the uniqueness in the database isn't something that should ever be hit really - it's there as a safeguard against data corruption. I also can't immediately think of a case where this can happen, but can not say more about this anyways without knowing which instances you are running, what their configs are, and whether there's any replication going on. Further logs will also help, e.g. I am wondering whether this instance has received 255031 over NRTMv3 before and is trying to insert it twice, or whether it came up with that serial on its own somehow (which it should not).
Ok, we can see what those logs produce, but the uniqueness in the database isn't something that should ever be hit really - it's there as a safeguard against data corruption. I also can't immediately think of a case where this can happen, but can not say more about this anyways without knowing which instances you are running, what their configs are, and whether there's any replication going on.
Working getting further logs, there is no replication for the hosts in question.
Further logs will also help, e.g. I am wondering whether this instance has received 255031 over NRTMv3 before and is trying to insert it twice, or whether it came up with that serial on its own somehow (which it should not).
That appears to be what has happened. 255031 was received over nrtmv3, applied, subsequent updates were received and applied, then the update for 255031 was attempted again.
btw, this seems to have happened again on the 25th of December.
btw, this seems to have happened again on the 25th of December.
with serial number 530173
I think the problem started earlier than the 25th. Full disclosure... I do a irrd_mirror_force_reload on RADB once a week via cron. The failure coincides with that action. Coincidence? Hard to believe that. Here are some (maybe not all) pertinent entries from log file. Start at 12-23 16:52z, the time of the last appearingly normal NRTM update (we update every 15 minutes):
2023-12-23 16:52:08,351 irrd[2869178]: [irrd.mirroring.scheduler#DEBUG] Started new process RPSLMirrorImportUpdateRunner-RADB for mirror import/export for RADB 2023-12-23 16:52:08,413 irrd[1995035]: [irrd.mirroring.mirror_runners_import#DEBUG] Most recent mirrored serial for RADB: 530173, force_reload: False, nrtm enabled: True 2023-12-23 16:52:08,413 irrd[1995035]: [irrd.mirroring.mirror_runners_import#INFO] Retrieving NRTM updates for RADB from serial 530174 on nrtm.radb.net:43 2023-12-23 16:52:08,413 irrd[1995035]: [irrd.utils.whois_client#DEBUG] Running whois query -g RADB:3:530174-LAST on nrtm.radb.net port 43 2023-12-23 16:52:08,526 irrd[1995035]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: % Warning: there are no newer updates available
So at that time, my interpretation is that there are no updates newer than serial 530173. At 12-23 17:00z I ran irrd_mirror_force_reload: 2023-12-23 17:00:03,417 irrd[1995709]: [irrd.storage.database_handler#INFO] force_reload flag set for RADB, serial synchronisation based on current settings: False
At the next NRTM interval (17:07), I see that the current serial has jumped to 535978, but the import via FTP seems to go OK: 2023-12-23 17:07:09,721 irrd[2869178]: [irrd.mirroring.scheduler#DEBUG] Started new process RPSLMirrorImportUpdateRunner-RADB for mirror import/export for RADB 2023-12-23 17:07:09,762 irrd[1996872]: [irrd.mirroring.mirror_runners_import#DEBUG] Most recent mirrored serial for RADB: 530173, force_reload: True, nrtm enabled: True 2023-12-23 17:07:09,763 irrd[1996872]: [irrd.mirroring.mirror_runners_import#INFO] Running full RPSL import of RADB from ['ftp://ftp.radb.net/radb/dbase/radb.db.gz'], serial from ftp://ftp.radb.net/radb/dbase/RADB.CURRENTSERIAL 2023-12-23 17:07:10,130 irrd[1996872]: [irrd.mirroring.mirror_runners_import#INFO] Downloaded ftp://ftp.radb.net/radb/dbase/RADB.CURRENTSERIAL, contained 535978 2023-12-23 17:07:17,123 irrd[1996872]: [irrd.mirroring.parsers#DEBUG] Starting file import of RADB from /app/irrd/tmp/tmprj8ayrgz 2023-12-23 17:21:57,939 irrd[1996872]: [irrd.mirroring.parsers#INFO] File import for RADB: 1390207 objects read, 1390207 objects inserted, ignored 0 due to errors, ignored 0 due to object_class_filter, source /app/irrd/tmp/tmprj8ayrgz
Then at the next NRTM interval (which is only 13 seconds later) is when things start to go haywire (hmmm, where did that expression come from?): 2023-12-23 17:22:11,109 irrd[2869178]: [irrd.mirroring.scheduler#DEBUG] Started new process RPSLMirrorImportUpdateRunner-RADB for mirror import/export for RADB 2023-12-23 17:22:11,151 irrd[2001141]: [irrd.mirroring.mirror_runners_import#DEBUG] Most recent mirrored serial for RADB: 535978, force_reload: False, nrtm enabled: True 2023-12-23 17:22:11,151 irrd[2001141]: [irrd.mirroring.mirror_runners_import#INFO] Retrieving NRTM updates for RADB from serial 535979 on nrtm.radb.net:43 2023-12-23 17:22:11,151 irrd[2001141]: [irrd.utils.whois_client#DEBUG] Running whois query -g RADB:3:535979-LAST on nrtm.radb.net port 43 2023-12-23 17:22:11,261 irrd[2001141]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: %% ERROR: Serials 530173 - 535979 do not exist
And things stay like that until 12-25 12:56z when the only change is that the range of serials reported in the RADB response to our query (same query as above) is "535979 - 543646". And that continues every 15 minutes until 12-27 15:26 when I did another irrd_mirror_force_reload: **2023-12-27 15:15:26,375 irrd[2869178]: [irrd.mirroring.scheduler#DEBUG] Started new process RPSLMirrorImportUpdateRunner-RADB for mirror import/export for RADB 2023-12-27 15:15:26,414 irrd[1574475]: [irrd.mirroring.mirror_runners_import#DEBUG] Most recent mirrored serial for RADB: 535978, force_reload: False, nrtm enabled: True 2023-12-27 15:15:26,415 irrd[1574475]: [irrd.mirroring.mirror_runners_import#INFO] Retrieving NRTM updates for RADB from serial 535979 on nrtm.radb.net:43 2023-12-27 15:15:26,415 irrd[1574475]: [irrd.utils.whois_client#DEBUG] Running whois query -g RADB:3:535979-LAST on nrtm.radb.net port 43 2023-12-27 15:15:26,543 irrd[1574475]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: %% ERROR: Serials 535979 - 543646 do not exist
2023-12-27 15:26:40,040 irrd[1578473]: [irrd.storage.database_handler#INFO] force_reload flag set for RADB, serial synchronisation based on current settings: False
2023-12-27 15:30:27,663 irrd[2869178]: [irrd.mirroring.scheduler#DEBUG] Started new process RPSLMirrorImportUpdateRunner-RADB for mirror import/export for RADB 2023-12-27 15:30:27,704 irrd[1578781]: [irrd.mirroring.mirror_runners_import#DEBUG] Most recent mirrored serial for RADB: 535978, force_reload: True, nrtm enabled: True 2023-12-27 15:30:27,704 irrd[1578781]: [irrd.mirroring.mirror_runners_import#INFO] Running full RPSL import of RADB from ['ftp://ftp.radb.net/radb/dbase/radb.db.gz'], serial from ftp://ftp.radb.net/radb/dbase/RADB.CURRENTSERIAL 2023-12-27 15:30:28,082 irrd[1578781]: [irrd.mirroring.mirror_runners_import#INFO] Downloaded ftp://ftp.radb.net/radb/dbase/RADB.CURRENTSERIAL, contained 559866 2023-12-27 15:30:35,742 irrd[1578781]: [irrd.mirroring.parsers#DEBUG] Starting file import of RADB from /app/irrd/tmp/tmp2oew3wc6 2023-12-27 15:45:45,384 irrd[1578781]: [irrd.mirroring.parsers#INFO] File import for RADB: 1391073 objects read, 1391073 objects inserted, ignored 0 due to errors, ignored 0 due to object_class_filter, source /app/irrd/tmp/tmp2oew3wc6**
Now everything is looking reasonable: **2023-12-27 15:45:59,206 irrd[2869178]: [irrd.mirroring.scheduler#DEBUG] Started new process RPSLMirrorImportUpdateRunner-RADB for mirror import/export for RADB 2023-12-27 15:45:59,246 irrd[1580419]: [irrd.mirroring.mirror_runners_import#DEBUG] Most recent mirrored serial for RADB: 559866, force_reload: False, nrtm enabled: True 2023-12-27 15:45:59,247 irrd[1580419]: [irrd.mirroring.mirror_runners_import#INFO] Retrieving NRTM updates for RADB from serial 559867 on nrtm.radb.net:43 2023-12-27 15:45:59,247 irrd[1580419]: [irrd.utils.whois_client#DEBUG] Running whois query -g RADB:3:559867-LAST on nrtm.radb.net port 43 2023-12-27 15:45:59,814 irrd[1580419]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: %START Version: 3 RADB 559867-568497 2023-12-27 15:45:59,833 irrd[1580419]: [irrd.mirroring.parsers#DEBUG] Found valid start line for RADB, range 559867-568497
2023-12-27 16:01:00,553 irrd[2869178]: [irrd.mirroring.scheduler#DEBUG] Started new process RPSLMirrorImportUpdateRunner-RADB for mirror import/export for RADB 2023-12-27 16:01:00,592 irrd[1583183]: [irrd.mirroring.mirror_runners_import#DEBUG] Most recent mirrored serial for RADB: 568497, force_reload: False, nrtm enabled: True 2023-12-27 16:01:00,592 irrd[1583183]: [irrd.mirroring.mirror_runners_import#INFO] Retrieving NRTM updates for RADB from serial 568498 on nrtm.radb.net:43 2023-12-27 16:01:00,592 irrd[1583183]: [irrd.utils.whois_client#DEBUG] Running whois query -g RADB:3:568498-LAST on nrtm.radb.net port 43 2023-12-27 16:01:00,748 irrd[1583183]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: %START Version: 3 RADB 568498-568572 2023-12-27 16:01:00,748 irrd[1583183]: [irrd.mirroring.parsers#DEBUG] Found valid start line for RADB, range 568498-568572
2023-12-27 16:16:02,170 irrd[2869178]: [irrd.mirroring.scheduler#DEBUG] Started new process RPSLMirrorImportUpdateRunner-RADB for mirror import/export for RADB 2023-12-27 16:16:02,210 irrd[1584674]: [irrd.mirroring.mirror_runners_import#DEBUG] Most recent mirrored serial for RADB: 568572, force_reload: False, nrtm enabled: True 2023-12-27 16:16:02,211 irrd[1584674]: [irrd.mirroring.mirror_runners_import#INFO] Retrieving NRTM updates for RADB from serial 568573 on nrtm.radb.net:43 2023-12-27 16:16:02,211 irrd[1584674]: [irrd.utils.whois_client#DEBUG] Running whois query -g RADB:3:568573-LAST on nrtm.radb.net port 43 2023-12-27 16:16:02,387 irrd[1584674]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: %START Version: 3 RADB 568573-568667 2023-12-27 16:16:02,388 irrd[1584674]: [irrd.mirroring.parsers#DEBUG] Found valid start line for RADB, range 568573-568667**
Have I left anything out of this report that might be in our logs? Please let me know. RADB folks please chime in here.
we did not do a force reload until about 20utc on the 25th. And had our noc contact db-admin@radb.net
We saw the same sql constraint violation but being the holidays I haven't dug into the logs yet. A reseed was required internally again.
Reviewing a recent incident of this constraint violation and noticed that this exception is thrown by an ADD
that is going through the delete code path. The delete path flushes the _rpsl_upsert_buffer
first. Would this mean that the add was handled previously during another flush but the buffer was not cleared before this delete happened? These operations are for and DEL
and then a ADD
for the same prefix in a very short time window (within the same second sometimes).
2024-02-19 00:01:18,878 irrd[2309452]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/1176341/ADD/route6/2605:9CC0:397::/48AS8987
2024-02-19 00:01:18,879 irrd[2309452]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/1176342/ADD/route6/2605:9CC0:399::/48AS8987
2024-02-19 00:01:18,891 irrd[2309452]: [irrd.mirroring.mirror_runners_import#ERROR] An exception occurred while attempting a mirror update or initial import for RADB: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "rpsl_objects_history_serial_nrtm_source_unique"
DETAIL: Key (serial_nrtm, source)=(1176341, RADB) already exists.
[SQL: INSERT INTO rpsl_database_journal (serial_global, rpsl_pk, source, origin, serial_nrtm, operation, object_class, object_text, timestamp) VALUES (nextval('rpsl_database_journal_serial_global_seq'), %(rpsl_pk)s, %(source)s, %(origin)s, %(serial_nrtm)s, %(operation)s, %(object_class)s, %(object_text)s, %(timestamp)s) RETURNING rpsl_database_journal.serial_nrtm]
[parameters: {'rpsl_pk': '2605:9CC0:397::/48AS8987', 'source': 'RADB', 'origin': 'mirror', 'serial_nrtm': 1176341, 'operation': 'add_or_update', 'object_class': 'route6', 'object_text': 'route6: 2605:9cc0:397::/48\norigin: AS8987\ndescr: Amazon EC2 Prefix\nmnt-by: MAINT-AS16509\nsource: radb\nchanged: noc@amazon.com 20240218 #18:43:49\nlast-modified: 2024-02-18T18:43:50Z\n', 'timestamp': datetime.datetime(2024, 2, 19, 0, 1, 18, 889869, tzinfo=datetime.timezone.utc)}]
(Background on this error at: http://sqlalche.me/e/13/gkpj)
Traceback (most recent call last):
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "rpsl_objects_history_serial_nrtm_source_unique"
DETAIL: Key (serial_nrtm, source)=(1176341, RADB) already exists.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/irrd/mirroring/mirror_runners_import.py", line 66, in run
self.update_stream_runner.run(serial_newest_mirror, database_handler=self.database_handler)
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/irrd/mirroring/mirror_runners_import.py", line 442, in run
operation.save(database_handler)
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/irrd/mirroring/nrtm_operation.py", line 96, in save
database_handler.delete_rpsl_object(
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/irrd/storage/database_handler.py", line 577, in delete_rpsl_object
self._flush_rpsl_object_writing_buffer()
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/irrd/storage/database_handler.py", line 739, in _flush_rpsl_object_writing_buffer
self.status_tracker.record_operation(
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/irrd/storage/database_handler.py", line 1043, in record_operation
insert_result = self.database_handler.execute_statement(stmt).fetchone()
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/irrd/storage/database_handler.py", line 239, in execute_statement
return self._connection.execute(statement)
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/home/irrd/srv/releases/4.4.2/venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "rpsl_objects_history_serial_nrtm_source_unique"
DETAIL: Key (serial_nrtm, source)=(1176341, RADB) already exists.
Snippet of logs of operations on the same prefix. why they are doing this I don't know.
2024-02-19 07:04:37,850 irrd[2394162]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/1180490/ADD/route6/2605:9CC0:397::/48AS8987
2024-02-19 07:04:37,872 irrd[2394162]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/1180495/DEL/route6/2605:9CC0:397::/48AS8987
2024-02-19 07:04:37,874 irrd[2394162]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/1180498/ADD/route6/2605:9CC0:397::/48AS8987
2024-02-19 07:04:37,886 irrd[2394162]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/1180501/DEL/route6/2605:9CC0:397::/48AS8987
2024-02-19 07:06:38,470 irrd[2394379]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/1180507/ADD/route6/2605:9CC0:397::/48AS8987
2024-02-19 07:07:38,649 irrd[2394426]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/1180513/DEL/route6/2605:9CC0:397::/48AS8987
2024-02-19 07:08:39,302 irrd[2394464]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/1180743/ADD/route6/2605:9CC0:397::/48AS8987
2024-02-19 07:08:39,307 irrd[2394464]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/1180744/DEL/route6/2605:9CC0:397::/48AS8987
2024-02-19 07:09:39,113 irrd[2394507]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/1180745/ADD/route6/2605:9CC0:397::/48AS8987
2024-02-19 07:09:39,131 irrd[2394507]: [irrd.mirroring.nrtm_operation#INFO] Completed NRTM operation RADB/1180748/DEL/route6/2605:9CC0:397::/48AS8987
This is very useful data, and the issue is then probably somewhere in the handling of the upsert buffer or the insert/delete queries. I will dig into the code.
So, the duplicate key error here concerns the journal, where, as a safeguard, serial and source are unique. NRTMv3 exports would otherwise break. The serial_nrtm has three possible sources:
The serial actually inserted by the DB is then added to the know new serials, for use for the next journal entry created and some metadata. This is not thread/multiprocess safe, but there should only be one process running to mirror one source, as ensured by the mirroring scheduler.
Questions:
!JRADB
on this instance? Particularly useful to know if it is running with synchronised serials.
- What is the configuration for the RADB source on the instance that generated this error? I'm assuming it's an NRTMv3 client from another host?
RADB:
authoritative: false
keep_journal: true
import_serial_source: "file:///<path>/radb/dbase/RADB.CURRENTSERIAL"
import_source: "file:///<path>/radb/dbase/radb.db.gz"
import_timer: 60
nrtm_host: "<auth_db_ip>"
nrtm_port: "4343"
nrtm_access_list: nrtm_public
- What is the output of
!JRADB
on this instance? Particularly useful to know if it is running with synchronised serials.
rpki is disabled, no scope filters, and route object prefs disabled on this host.
$ echo '!JRADB' | nc localhost 4343
A526
{
"RADB": {
"source_type": "regular",
"authoritative": false,
"object_class_filter": null,
"rpki_rov_filter": false,
"scopefilter_enabled": false,
"route_preference": null,
"local_journal_kept": true,
"serial_oldest_journal": 1246157,
"serial_newest_journal": 1256489,
"serial_last_export": null,
"serial_newest_mirror": 1256489,
"last_update": "2024-02-27T21:25:00.799334+00:00",
"synchronised_serials": true
}
}
C
- What are the log entries of this instance for any entry mentioning 1176330-1176350?
- If synchronised serials are enabled, what are the log entries of the NRTMv3 source for those serials, if any?
The log level was reset to INFO
so the logs are not particularly interesting. I've reset the level to DEBUG
to catch the next time. If you still want the logs I'll create a gist and send it.
- Is this a single instance that logged this exact error, or do you have other mirror clients that produced the same error?
Have only seen this error on our host that serves nrtm.radb.net.
I'm on IRRd 4.4.3 and seeing something very similar - I hope this is the right issue to report this to.
There are a few sources (not just RADB) that sometimes are out of sync and I need to kick a manual irrd_mirror_force_reload
in order to force an update. When I noticed this for the first time, they haven't been updated in many months.
I'm noticing this with RADB, APNIC, NTTCOM, and sometimes LACNIC.
Status:
IRRD version 4.4.3
--------------------------------------------------------------------------
source total obj rt obj aut-num obj serial last export
--------------------------------------------------------------------------
APNIC 2726850 744889 22412 None
NTTCOM 372257 363477 568 None
RADB 1395174 1170400 9759 None
Samples:
!JRADB
A520
{
"RADB": {
"source_type": "regular",
"authoritative": false,
"object_class_filter": null,
"rpki_rov_filter": false,
"scopefilter_enabled": false,
"route_preference": null,
"local_journal_kept": true,
"serial_oldest_journal": null,
"serial_newest_journal": null,
"serial_last_export": null,
"serial_newest_mirror": 2052890,
"last_update": "2024-08-19T16:52:36.911378+00:00",
"synchronised_serials": true
}
}
C
!JNTTCOM
A522
{
"NTTCOM": {
"source_type": "regular",
"authoritative": false,
"object_class_filter": null,
"rpki_rov_filter": false,
"scopefilter_enabled": false,
"route_preference": null,
"local_journal_kept": true,
"serial_oldest_journal": null,
"serial_newest_journal": null,
"serial_last_export": null,
"serial_newest_mirror": 1988580,
"last_update": "2024-08-19T16:26:12.047086+00:00",
"synchronised_serials": true
}
}
The last_update
field practically shows the time when the refresh was executed after running irrd_mirror_force_reload
, whereas other sources have a more recent timestamp, while serial_oldest_journal
and serial_newest_journal
have some actual numbers.
I'm not seeing any error, in fact I do see the NRTM updates:
2024-08-20 08:26:03,669 irrd[681]: [irrd.mirroring.mirror_runners_import#INFO] Retrieving NRTM updates for LACNIC from serial 428757 on irr.lacnic.net:43
2024-08-20 08:26:03,671 irrd[680]: [irrd.mirroring.mirror_runners_import#INFO] Retrieving NRTM updates for APNIC from serial 12493555 on whois.apnic.net:43
2024-08-20 08:26:03,672 irrd[682]: [irrd.mirroring.mirror_runners_import#INFO] Retrieving NRTM updates for NTTCOM from serial 1988577 on rr.ntt.net:43
But they just don't seem to trigger any update, I suspect it might be caused by the serial_oldest_journal
and serial_newest_journal
being nil in the database:
irrd=# select serial_oldest_journal,serial_newest_journal from database_status where source='APNIC';
serial_oldest_journal | serial_newest_journal
-----------------------+-----------------------
|
(1 row)
I'm on IRRd 4.4.3 and seeing something very similar - I hope this is the right issue to report this to.
As a client, your local serial_newest/oldest_journal are not too relevant, the important field is serial_newest_mirror which identifies the latest known serial from the mirror server, i.e. NRTM will next start from that serial + 1.
Right now, NTT is on 1988846 (per !JNTTCOM
on rr.ntt.net
), so your numbers are in the right ballpark. If I query -g NTTCOM:3:1988577-LAST
from rr.ntt.net
, I get a reasonable number of updates.
You are the first person to report this with other mirror servers than RADB. Perhaps the issue is wider, perhaps there is a different issue involved. The first step is to set the log level to DEBUG, and to provide me all logs relating to the affected sources starting from the forced reload, when this reoccurs.
We are still hitting up against duplicate key value violates unique constraint "rpsl_objects_history_serial_nrtm_source_unique"
errors. Much less frequently now after resolving issues with a maintainer but we still see them. @mxsasha did you ever find anything on this error?
Describe the bug It looks like serial numbers can sometimes be corrupted. Once they are corrupted, NRTM processing of further updates fails. The LEVEL3 instance of IRRdv4 is mirroring RADB. Consider the following entries from irrd.log (grep RADB):
2023-12-04 13:39:13,137 irrd[210576]: [irrd.mirroring.scheduler#DEBUG] Started new process RPSLMirrorImportUpdateRunner-RADB for mirror import/export for RADB 2023-12-04 13:39:13,175 irrd[1188550]: [irrd.mirroring.mirror_runners_import#DEBUG] Most recent mirrored serial for RADB: 255031, force_reload: False, nrtm enabled: True 2023-12-04 13:39:13,176 irrd[1188550]: [irrd.mirroring.mirror_runners_import#INFO] Retrieving NRTM updates for RADB from serial 255032 on nrtm.radb.net:43 2023-12-04 13:39:13,176 irrd[1188550]: [irrd.utils.whois_client#DEBUG] Running whois query -g RADB:3:255032-LAST on nrtm.radb.net port 43 2023-12-04 13:39:13,300 irrd[1188550]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: % Warning: there are no newer updates available
Above is the last entries where I got expected behavior. The "import_timer" seconds (900) later the following occurred.
2023-12-04 13:54:14,468 irrd[210576]: [irrd.mirroring.scheduler#DEBUG] Started new process RPSLMirrorImportUpdateRunner-RADB for mirror import/export for RADB 2023-12-04 13:54:14,506 irrd[1189565]: [irrd.mirroring.mirror_runners_import#DEBUG] Most recent mirrored serial for RADB: 255031, force_reload: False, nrtm enabled: True 2023-12-04 13:54:14,507 irrd[1189565]: [irrd.mirroring.mirror_runners_import#INFO] Retrieving NRTM updates for RADB from serial 255032 on nrtm.radb.net:43 2023-12-04 13:54:14,508 irrd[1189565]: [irrd.utils.whois_client#DEBUG] Running whois query -g RADB:3:255032-LAST on nrtm.radb.net port 43 2023-12-04 13:54:14,622 irrd[1189565]: [irrd.mirroring.mirror_runners_import#DEBUG] Received NRTM response for RADB: %% ERROR: Serials 255032 - 268299 do not exist
Starting at 2023-12-04 13:54:14, "255032-LAST" seems to be interpreted as "255032 - 268299" and produces an error about non-existent serials. The error persisted (for over 24 hours) until I performed "irrd_mirror_force_reload RADB". Some RADB updates were missed during this period. The updates were recovered when I did the force_reload. I suspect (without great evidence) that the error occurred in RADB's instance of IRRd. I have seen this same situation more than a year ago (so long before IRRdv4 4.4.2). I don't know if I am running the same version of IRRd now as I was the last time I saw this (but maybe).
To Reproduce nrtm_host is set to nrtm.radb.net Unknown how to reproduce the problem
Expected behaviour
IRRd version you are running I am running IRRdv4 4.2.6, I don't know what version RADB is running (suspect 4.4.2)
Additional context None