source database order affecting results

schelcj commented 10 months ago

Describe the bug Database order affecting query results with or without sources_defualt: defined.

To Reproduce

$ nc whois.radb.net 43
!!
!sRADB,RIPE
C
!iAS-GOOGLE
A289
AS-GOOGLE-IT AS-MEEBO AS-METAWEB-2 AS11344 AS139070 AS139190 AS13949 AS15169 AS15276 AS19425 AS19527 AS22577 AS24424 AS26684 AS26910 AS32381 AS36039 AS36040 AS36383 AS36384 AS36411 AS36492 AS36520 AS36561 AS394089 AS394699 AS394725 AS395973 AS396982 AS40873 AS41264 AS43515 AS55023 AS6432
C
!sRIPE,RADB
C
!iAS-GOOGLE
D

Expected behaviour Results to the be same regadless of the order of source databases selected.

IRRd version you are running 4.4.2

Additional context

troy2914 commented 10 months ago

That is the intended result, use the source that was specified. You have AS-GOOGLE in both RIPE and RADB and the one in RIPE does not have any members. Now if you mean it shouldn't use a no-member as-set, that i can agree with.

job commented 10 months ago

It should use a no-member set, because the no-member set exists. We have to assume it’s empty for a reason, we shouldn’t proceed in a non-deterministic fashion and pick some other set.

We documented this years ago:

In !i set expansion queries, legacy IRRd does not consistently follow the source order prioritisation when resolving sets. This may cause unexpected empty responses or different responses, as set expansion can produce dramatic differences based on source order prioritisation. For example: AS-AKAMAI exists in RADB and RIPE, but the RADB object has no members. When the RADB source is prioritised, IRRd version 4 correctly answers !iAS-AKAMAI with an empty response, but legacy IRRd refers to the RIPE object instead.

https://irrd.readthedocs.io/en/stable/admins/migrating-legacy-irrd/

schelcj commented 10 months ago

So then we'll affect query results with the order of sources_default list (or just the order the databases are defined without the default list) if someone does not specifiy exactly which databases in exactly the order they want?

job commented 10 months ago

Yes, the idea is that the IRRd operator sets a default ordering they feel comfortable with (a reasonable approach might be to prioritise RADB, then the RIR-managed IRRdbs, and then the rest); in addition - the issuer of the query can always decide their preference for ordering.

in any case the order always has to be set to deterministically resolve !i or !a queries: either by the IRRd operator, or the query issuer.

job commented 10 months ago

Various registries (both RIR and third party IRR) have undertaken efforts in the last 12 months to promote the concept of hierarchical as-set naming to reduce the risk of naming collisions and unexpected outcomes

amshikov commented 10 months ago

Hello!

the issuer of the query can always decide their preference for ordering.

!g query returns data from all given sources. !a returns data only from the first matched source. And this is really annoying. I have hundreds of customers and I cannot maintain separate sources list for all of them. AS-GOOGLE is present in RIPE DB and there it is empty, actual data is in RADB. And vise versa: some other as-set can be present in RADB and be there empty, actual data can be in RIPE DB. Thus, if I don't know what DB stores actual data, I have to send separate request to each source instead one request.

This does not make sense. Hierarchical as-set naming is cool but how it can prevent similar as-set names in different IRR databases?

mxsasha commented 8 months ago

This is indeed working as designed, although there are unfortunate situations where it is inconvenient. The big problem with any alternatives is maintaining backwards compatibility while actually helping for current problematic sets.

job commented 8 months ago

I have hundreds of customers and I cannot maintain separate sources list for all of them

Why not though? At all companies I worked we had a source list specific to each peer / customer, going into the thousands. Data like this can be stored in SQL or YAML, pretty straight forward. Also the major IX Route Server automation software packages (IXP Manager & ArouteServer) also support this model out-of-the-box.

tangledhelix commented 8 months ago

And this is really annoying. I have hundreds of customers and I cannot maintain separate sources list for all of them.

You don't need to; all you need is a default sources list, plus a list of exceptions for specific customers. At 2914, we only need to define a custom sources list for ~5% of ASNs we peer with. And about 70% of those exceptions are the same ordering change (promoting RIPE to the front of our default list).

amshikov commented 2 months ago

Hello!

This is indeed working as designed, although there are unfortunate situations where it is inconvenient. The big problem with any alternatives is maintaining backwards compatibility while actually helping for current problematic sets.

Why then RIPE-style queries work in different manner? Let's test both RIPE- and IRRd-style queries with same list sources order: RIPE Style: `~#telnet whois.radb.net 43 Trying 198.108.0.18... Connected to whois.radb.net. Escape character is '^]'. !! -k -K -s RIPE,RADB AS-GOOGLE as-set: AS-GOOGLE

as-set: AS-GOOGLE members: AS11344 members: AS13949 [...skipped...] members: AS36520 members: AS394089

-k -K -s RADB,RIPE AS-GOOGLE as-set: AS-GOOGLE members: AS11344 members: AS13949 [...skipped...] members: AS36520 members: AS394089`

-- output are exactly the same, i.e. result does not depend on the sources' order. Repeat the test with IRRd-style queries:

~# telnet whois.radb.net 43 Trying 198.108.0.18... Connected to whois.radb.net. Escape character is '^]'. !! !sRIPE,RADB C !iAS-GOOGLE D !sRADB,RIPE C !iAS-GOOGLE A289 AS-GOOGLE-IT AS-MEEBO AS-METAWEB-2 AS11344 AS139070 AS139190 AS13949 AS15169 AS15276 AS19425 AS19527 AS22577 AS24424 AS26684 AS26910 AS32381 AS36039 AS36040 AS36383 AS36384 AS36411 AS36492 AS36520 AS36561 AS394089 AS394699 AS394725 AS395973 AS396982 AS40873 AS41264 AS43515 AS55023 AS6432 C -- result depends on sources' order.

Sorry, this does not look like right design. IMHO returned results should not depend on style of query.

mxsasha commented 2 months ago

-- output are exactly the same, i.e. result does not depend on the sources' order. Repeat the test with IRRd-style queries:

When querying RPSL object text, which includes -K, the output order has no inherent meaning, and is not sorted.

irrdnet / irrd

source database order affecting results #873