Add config for returning dummy object in the NRTMv3 response

Justin-APNIC commented 7 months ago

Add new configs in IRRD for returning the dummy object in NRTMv3 response

Example config

                nrtm_response_dummy_object_class:
                  - person
                  - role
                nrtm_response_dummy_attributes:
                  person: Dummy name for %s
                  role: Dummy role for %s
                  address: Dummy address for %s
                  phone: '+31205354444'
                  e-mail: unread@ripe.net
                  upd-to: unread@ripe.net
                  descr: Dummy description for %s
                nrtm_response_dummy_remarks: |
                  ****************************
                  * THIS OBJECT IS NOT VALID
                  * Please note that all personal data has been removed from this object.
                  * To view the original object, please query the APNIC Database at:
                  * http://www.apnic.net/whois
                  ****************************

Example response

role:           Dummy role for AT1599-AP
address:        Dummy address for AT1599-AP
country:        ZZ
phone:          +31205354444
e-mail:         unread@ripe.net
admin-c:        TPLA28-AP
tech-c:         TPLA28-AP
nic-hdl:        AT1599-AP
mnt-by:         APNIC-ABUSE
last-modified:  2024-03-14T23:45:51Z
source:         APNIC
remarks:        ****************************
remarks:        * THIS OBJECT IS NOT VALID
remarks:        * Please note that all personal data has been removed from this object.
remarks:        * To view the original object, please query the APNIC Database at:
remarks:        * http://www.apnic.net/whois
remarks:        ****************************

mxsasha commented 7 months ago

Nice! This will take a bit of time to review, but my first thought is that the logic to generate dummified object text should be separated to make it easy to apply to RPSL full exports and NRTMv4 as well. Probably in the same place as remove_auth_hashes(). Makes tests a bit simpler too.

(Initially I thought about putting it on RPSLObject, but we intentionally don't have that in this situation, just the text from the database.)

Justin-APNIC commented 7 months ago

Nice! This will take a bit of time to review, but my first thought is that the logic to generate dummified object text should be separated to make it easy to apply to RPSL full exports and NRTMv4 as well. Probably in the same place as remove_auth_hashes(). Makes tests a bit simpler too.

Yes, the function dummy_rpsl_object() does exist in the text.py where theremove_auth_hashes() function is. The codes in nrtm_generator are just for retrieving the params that the dummy function needs from the configuration file.

Justin-APNIC commented 7 months ago

@mxsasha I have a question regarding the primary key in the RPSL object. It appears that we are converting the primary key to uppercase all the time. Could we add any configurations or functions in the rpsl/parser.py to preserve the original primary key case? I've reviewed the code, and it seems that IRRD parses attributes in different object classes differently. Currently, I don't have a straightforward solution, so I'd appreciate your input on this matter.

The reason I want to keep the original value is that I want to return the original pk instead of the uppercase one in the dummy NRTM response. For example, I want to have IRT-Youminet-CN in the dummy address instead of IRT-YOUMINET-CN

irt:            IRT-Youminet-CN
address:        Dummy address for IRT-YOUMINET-CN

mxsasha commented 6 months ago

Nice! This will take a bit of time to review, but my first thought is that the logic to generate dummified object text should be separated to make it easy to apply to RPSL full exports and NRTMv4 as well. Probably in the same place as remove_auth_hashes(). Makes tests a bit simpler too.

Yes, the function dummy_rpsl_object() does exist in the text.py where theremove_auth_hashes() function is. The codes in nrtm_generator are just for retrieving the params that the dummy function needs from the configuration file.

What I meant is: to properly support dummy mirroring, dummifying needs to happen in the NRTM v3 generator, the source export (if unfiltered is not set, currently called "remove_auth_hashes" I think), and the NRTMv4 server. With as little duplication as possible. So the call from the NRTMv3 generator should be something like: text = dummify_object_text(source, object_class, text).

@mxsasha I have a question regarding the primary key in the RPSL object. It appears that we are converting the primary key to uppercase all the time. Could we add any configurations or functions in the rpsl/parser.py to preserve the original primary key case? I've reviewed the code, and it seems that IRRD parses attributes in different object classes differently. Currently, I don't have a straightforward solution, so I'd appreciate your input on this matter.

The only place where we retain the original case for the PK is in the object text, so we'd have to re-parse the whole object. PKs are case insensitive, so the indexed key is normalised. But I don't follow why you need it, your current implementation in text.py seems the most straightforward, without needing the PK separately.

Justin-APNIC commented 6 months ago

Nice! This will take a bit of time to review, but my first thought is that the logic to generate dummified object text should be separated to make it easy to apply to RPSL full exports and NRTMv4 as well. Probably in the same place as remove_auth_hashes(). Makes tests a bit simpler too.

Yes, the function dummy_rpsl_object() does exist in the text.py where theremove_auth_hashes() function is. The codes in nrtm_generator are just for retrieving the params that the dummy function needs from the configuration file.

What I meant is: to properly support dummy mirroring, dummifying needs to happen in the NRTM v3 generator, the source export (if unfiltered is not set, currently called "remove_auth_hashes" I think), and the NRTMv4 server. With as little duplication as possible. So the call from the NRTMv3 generator should be something like: text = dummify_object_text(source, object_class, text).

Ok

@mxsasha I have a question regarding the primary key in the RPSL object. It appears that we are converting the primary key to uppercase all the time. Could we add any configurations or functions in the rpsl/parser.py to preserve the original primary key case? I've reviewed the code, and it seems that IRRD parses attributes in different object classes differently. Currently, I don't have a straightforward solution, so I'd appreciate your input on this matter.

The only place where we retain the original case for the PK is in the object text, so we'd have to re-parse the whole object. PKs are case insensitive, so the indexed key is normalised. But I don't follow why you need it, your current implementation in text.py seems the most straightforward, without needing the PK separately.

My implementation in text.py needs a pk to be provided. As I mentioned above, If my config for the dummy attribute is like address: Dummy address for %s The %s will be replaced by the pk. However, the current codes will convert it to the uppercase if I call pk() function on the whole object or retrieve the pk from db directly, so the final dummy attribute will be address: Dummy address for IRT-YOUMINET-CN instead of address: Dummy address for IRT-Youminet-CN.

I can't find an easy way to find the original pk from the whole object.

mxsasha commented 5 months ago

My implementation in text.py needs a pk to be provided. As I mentioned above, If my config for the dummy attribute is like address: Dummy address for %s The %s will be replaced by the pk. However, the current codes will convert it to the uppercase if I call pk() function on the whole object or retrieve the pk from db directly, so the final dummy attribute will be address: Dummy address for IRT-YOUMINET-CN instead of address: Dummy address for IRT-Youminet-CN.

I can't find an easy way to find the original pk from the whole object.

There is none. The data simply isn't extracted, as PKs are case insensitive in every other usage, and therefore normalised to upper case during parsing of the initial object. Is it such an obstacle for you? Does the PK need to be repeated in the dummy text? After all, anyone who looks at the object will see the true PK already anyways.

Justin-APNIC commented 5 months ago

My implementation in text.py needs a pk to be provided. As I mentioned above, If my config for the dummy attribute is like address: Dummy address for %s The %s will be replaced by the pk. However, the current codes will convert it to the uppercase if I call pk() function on the whole object or retrieve the pk from db directly, so the final dummy attribute will be address: Dummy address for IRT-YOUMINET-CN instead of address: Dummy address for IRT-Youminet-CN. I can't find an easy way to find the original pk from the whole object.

There is none. The data simply isn't extracted, as PKs are case insensitive in every other usage, and therefore normalised to upper case during parsing of the initial object. Is it such an obstacle for you? Does the PK need to be repeated in the dummy text? After all, anyone who looks at the object will see the true PK already anyways.

Nvm, thanks.

Justin-APNIC commented 5 months ago

Left some notes inline. We also need to extend this to the source export runner, and the NRTM4 server. In those cases, we don't have client IPs, so our only option is to always enable it if configured.

Ok, we need this because we have some internal clients for accessing the IRRD NRTM3 server, so it needs the original data.

I also want to tweak the docs and settings names a bit, later.

No problem, thanks for looking into it.

irrdnet / irrd

Add config for returning dummy object in the NRTMv3 response #924