pierky / arouteserver

A tool to automatically build (and test) feature-rich configurations for BGP route servers.
https://arouteserver.readthedocs.org/
GNU General Public License v3.0
284 stars 46 forks source link

Registro.BR dump parsing resiliency and error message clarity #138

Closed bluikko closed 3 weeks ago

bluikko commented 1 month ago

Today 2024-07-15 the Registro.br dump has an invalid record:

AS264162|NC BRASIL TELECOM E SERVICOS LTDA- ME|21.171.965/0001-17|138.97.116.0/22|2804:2370::/32
264163 ^H|--whois err--
AS264164|Telecomunicações Netcoro Ltda|07.212.314/0001-05|138.94.132.0/22|2804:2378::/32|167.249.228.0/22

It should be noted that the dump file is actually from 2024-07-12 and has not been updated since.

The dump file is available as of now at the arouteserver default location: ftp://ftp.registro.br/pub/numeracao/origin/nicbr-asn-blk-latest.txt
And for downloads in the future, the direct link to data of 2024-07-12: ftp://ftp.registro.br/pub/numeracao/origin/nicbr-asn-blk-20240712.txt

This reveals two possible improvements that could be made in arouteserver:

Error messages

Currently it is difficult to find out where the error is exactly: on error the whole Registro.br dump is printed first and then an error message saying:

ARouteServer 2024-07-15 06:46:00,971 ERROR An error occurred while processing the Registro.br Whois database dump: invalid record 'AS174|COGENT BRASIL TELECOMUNICAÇÕES LTDA.|29.484.413/0001-70|2804:5330::/32
<<< snip the 8900 odd lines >>>
': unknown record format, less than 3 fields found - The error occurred while
loading the IRR data from ftp://ftp.registro.br/pub/numeracao/origin/nicbr-asn-blk-latest.txt.
If the data was moved to a different location, it is possible to adjust the
configuration by changing the value of 'filtering.irrdb.use_registrobr_bulk_whois_data.source'
inside the general.yml file.

Parsing resilience

I think I understand the rationale behind completely stopping processing if the dump file could not be parsed. It will prevent the situation where possibly invalid file ends up with no useful data and then the generated configuration file would be missing all Registro.br dump data.

Edit: It's "Registro.br" and not "NIC.BR"

pierky commented 1 month ago

Hi @bluikko,

please take a look at this patch: https://github.com/pierky/arouteserver/pull/135 It should implement the solution that you are proposing, that is increasing resiliency and making errors non-blockers

I've already merged it into dev, and I will include it in the next release.

I agree with the comment about the log message being pretty useless as it is now. I've pushed a commit to log only the broken record instead of the whole file (I think that was my actual original idea, but then raw vs row triggered the problem).

pierky commented 1 month ago

Thanks again for filing this issue.

I've just pushed out a new alpha release that contains this fix. If CI/CD will be happy, it will be available under the PyPi test instance in a few hours. It will be possible to install it by following the instructions at https://arouteserver.readthedocs.io/en/latest/INSTALLATION.html#development-and-pre-release-versions

If everything goes well, I plan to ship it as part of a new release in a few days.

bluikko commented 1 month ago

Oh sorry for the duplicate. I looked only the recent issues, didn't imagine this is a recurring issue going back months!

bluikko commented 1 month ago

I've just pushed out a new alpha release that contains this fix. If CI/CD will be happy, it will be available under the PyPi test instance in a few hours. It will be possible to install it by following the instructions at https://arouteserver.readthedocs.io/en/latest/INSTALLATION.html#development-and-pre-release-versions

If everything goes well, I plan to ship it as part of a new release in a few days.

It looks good to me.

I have tested with the docker image tag 1.23-alpha4-pypy3 and did not notice any regressions.
Testing with the invalid registro.br dump file it seems to work as expected and the invalid line is silently skipped.

Edit: I take it that other errors in the BR data dump file would still trigger printing of the whole file? That might be something to look at if it is not major work...

While I'm here, sorry to derail this issue, but I've noticed some changes in the IX-F JSON export between 1.21.6 and 1.22.1. After I updated from 1.21.6 to 1.22.1 the generated IX-F JSON export file seems to be missing a newline at the last line of the file (and some other issues maybe).
I did not see any relevant changes in a source diff for those versions when I looked into this, eventually I decided to just work around the issues since they were minor. Perhaps some library was updated that caused this change.
Do you think this might warrant its own issue?

bluikko commented 2 weeks ago

First of all I wanted to say thank you so much for your continued work and releasing the new version, seems to work great!

In the release notes for 1.23.0 it links to this issue and mentions that

now logs a warning message and skips the record.

But looking at the PR diff and the master branch file it looks like it just continues silently and does not print a warning?
If I recall correctly that was also corroborated by my testing using an invalid registro.br dump file with a whois err line.

It is a very, very small issue but I wonder if this was maybe an oversight and it was supposed to print a warning?
Personally I don't have strong feelings for either continue silently or with a warning.

Of course I might be wrong also, just took a new peek at the patch/file after I noticed a possible discrepancy in the release notes. In the case I'm wrong and it prints a warning then please accept my apology for wasting your time (again).

pierky commented 2 weeks ago

Of course I might be wrong also, just took a new peek at the patch/file after I noticed a possible discrepancy in the release notes. In the case I'm wrong and it prints a warning then please accept my apology for wasting your time (again).

I don't call it "wasting time", but rather "contributing to an OSS project", thanks for you feedback. It's thank to similar feedback and comments that the tool improved over time.

Let me try to explain the changes to that diff in the last release. There were actually 2 changes there:

So, there was no intention to print a warning when --whois err-- records were detected, as that's not really a parsing error. Maybe this is where the confusion comes from?

However, I'm also not super sure about how to handle similar situations. It's a bit of a corner case, grey area.