Error reporting in NRTMv4?

stkonst commented 2 years ago

Hi team,

I was wondering if we should consider error reporting in NRTMv4 from client to server side. Although HTTPS can help on basic errors like "file does not exist" etc etc it cannot help on errors that are related to the content. While reading the draft, some scenarios came to my mind like:

The server publishes a new Update notification file but misses to update the URL of the new snapshot or misses to include the new Delta file
- The delta file might exist but have zero data in it.
- The new snapshot (available in .gz) might have corrupted data (once the client decompress the data). Especially since 3.3.3 allows for data to be modified prior their publication.
- Version mismatch?

and various others...

Those are mistakes that are generated from the server (due to software bugs) and the client can detect them. Of course the mirror client can reject the changes once the validation fails, however if the mirror server receives errors from clients then perhaps those errors could be detected faster from mirror server operator.

What are your thoughts about it?

mxsasha commented 2 years ago

I think it's nice but not easy to execute. The client would need some kind of reporting mechanism, so some kind of API, or email address (meh). And if you publish something bad, you'll potentially be flooded with clients.

Validity is important, so my idea instead was to write and publish a NRTMv4 repository validator that you could point at your publication point (or well known domain, or whatever else we end up telling mirrors to use as their config). That could be added to automated monitoring. It's not a very complex protocol to validate. Should have good signal to noise ratio.

With one catch:

The new snapshot (available in .gz) might have corrupted data (once the client decompress the data). Especially since 3.3.3 allows for data to be modified prior their publication.

There is limited validation we could do on the RPSL objects, hence 8.1, which mostly places that problem out of scope of NRTMv4.

stkonst commented 2 years ago

Yes I was thinking something like a callback API address which the mirror server provides to the mirror client and the last one could send a message to server that includes:

the session_id
the version number
the error code

Perhaps this amount of information is good enough for some initial alarms to be raised and start the troubleshooting. Of course this increases the complexity and we might face challenges like the ones you mentioned above.

The validator idea is not bad and is one way to go (indeed it's not a complex protocol) but I guess the network operator would need to run software 2 instances on his server: one for the mirror client and one for validator. Is that correct?

mxsasha commented 2 years ago

Oh, I think we have a different place in mind for the validator. I am thinking of this being run by the mirror server operator, and used in their monitoring. That allows integration with any existing monitoring/alerting systems of the same party who can actually fix it, creating a short feedback loop.

An API towards e.g. the mirror server IRRD would also have the issue that IRRD can listen for errors reported by mirror clients, but doesn't have alerting built in. It would have to log the report, then something else needs to process the logs and pick up on that, then trigger an alert towards someone who can fix it.

mxsasha commented 6 months ago

Discussed at RIPE87, we are keeping this out of scope.

mxsasha / nrtmv4

Error reporting in NRTMv4? #14