Closed luijs closed 2 months ago
Try... "irods_ssl_verify_server": "cert"
in the environment on the client side...
this is a possible workaround and may not address an actual bug.
In this demo that should indeed work. However, if the user cannot access hostname.localdomain.com
it will not be able to use the PRC. Also, I am using host access control, where I want to block off all access that does not come via the loadbalancer or hostname.localdomain.com
itself.
Sounds like the iCommands do some translation of the connecting hostname for load-balancing purposes. Is this something that the SSL interface or other network interfaces (in absence of an SSL connection) provide at lower level or is it a feature of iRODS server connections that this translation is done? @alanking @korydraughn
I tried another thing with the certificate. It looks like it needs the certificate to be valid for all possible hostnames.
I had a certificate that was only valid for hostname.localdomain.com
, and then PRC complained that the certificate was not valid for irods.publicdomain.com
. I had a certificate only for irods.publicdomain.com
, and then it complains that the certificate is invalid for hostname.localdomain.com
. When I had a certificate that was valid for both it started to work. However, this is exactly what I am hoping to not do, as I rather not expose the name hostname.localdomain.com
.
It seems you may have uncovered an issue with client redirection, but until we can confirm, that's only a theory.
What is the size of the file you're uploading?
The file above was just 2KB, but it also went wrong when creating a collection.
You mentioned the icommands work.
Please try to upload a 40MB file using iput and let us know what happens.
The file above was just 2KB, but it also went wrong when creating a collection.
Hmm, collections are virtual and do not require redirection. That means it may not be client redirection at play here.
Hmm, collections are virtual and do not require redirection. That means it may not be client redirection at play here.
Perhaps when connected to a catalog service consumer a redirection occurs to a catalog service provider in order to register the collection in the catalog? That's the only situation I can think of where that would happen, though.
Easy enough to test? But... wouldn't the consumer do that with a separate server-to-server connection, rather than having the client do that?
Correct. The servers redirect to the provider to carry out database operations.
When I speak of client redirection, I'm referring to the PRC's ability to find and connect to the destination resource server (for reads/writes). I don't expect the PRC to perform client redirection to create collections.
iput
does a similar thing using the high ports when the size of the transfer exceeds 32MB.
Oh, I see. Carry on!
But... wouldn't the consumer do that with a separate server-to-server connection, rather than having the client do that?
Yes.
it also went wrong when creating a collection
Right - this is the most interesting thing to investigate at the moment.
@luijs Can you attempt to create a collection and share the PRC code, the client logs, and the server logs?
This is something where every bit of config can change everything, which makes it quite confusing to test and be clear on what the current settings are. so here is another 2 cents.
So, I have to apologize here and come back on an earlier statement though, a collection can be created! I am sorry for the confusion caused here... I was basing that on my test output which has quite some boilerplate, so when I tested again just now with a very simple setup collection creation was fine. Maybe the boilerplate was doing something else that triggered the error. If I find something else there which is not a put I will post it.
I was just testing iput now. I also had problems there before, but it is clear how to solve that.
In the /etc/hosts
file of the irods server I put a line in:
1.2.3.4(is local known ip of the server) public.hostname.com local.hostname.com
and then iput
will connect to public.hostname.com
when transferring 40MB (is shown by irods if you use -V)
If you do
1.2.3.4 local.hostname.com public.hostname.com
and restart the server afterwards(NB, I feel irods will not pick up this change if you don't restart)
iput will connect to local.hostname.com
, and thus have errors in my case.
For python put however this change makes no difference, I get the local.hostname.com
CERTIFICATE_VERIFY_FAILED in both cases. There also seems to be no difference if I send a 40M or a 2KB file with the PRC
To confirm there aren't any DNS caching issues, please use the following /etc/hosts
settings:
1.2.3.4 public.hostname.com local.hostname.com
Please use a 40MB file for both uploads.
Tried that, results below: On the server:
myuser@local:~$ sudo nano /etc/hosts
[sudo] password for myuser:
irods@local:/home/WUR/myuser$ irodsctl start
On desktop WSL:
myuser@desktopWSL:~$ iput -KV file40M
From server: NumThreads=4, addr:public.hostname.com, port:20026, cookie=548488194
file40M 40.000 MB | 9.813 sec | 4 thr | 4.076 MB/s
myuser@desktopWSL:~$ python3 ~/scripts/putfilesimple.py file40M
file40M
Traceback (most recent call last):
File "/home/myuser/.local/lib/python3.10/site-packages/irods/pool.py", line 62, in get_connection
conn = self.idle.pop()
KeyError: 'pop from an empty set'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/myuser/scripts/putfilesimple.py", line 32, in <module>
create_file_in_irods(session,'/myzone/home/myuser/random1',filename,'one')
File "/home/myuser/scripts/putfilesimple.py", line 27, in create_file_in_irods
irodssession.data_objects.put(filename, "{}/{}".format(collname,filename),
File "/home/myuser/.local/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 194, in put
if not self.parallel_put( local_path, (obj,o), total_bytes = sizelist[0], num_threads = num_threads,
File "/home/myuser/.local/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 289, in parallel_put
return parallel.io_main( self.sess, data_or_path_, parallel.Oper.PUT | (parallel.Oper.NONBLOCKING if async_ else 0), file_,
File "/home/myuser/.local/lib/python3.10/site-packages/irods/parallel.py", line 438, in io_main
Io = Io()
File "/home/myuser/.local/lib/python3.10/site-packages/irods/parallel.py", line 49, in __call__
return self.function(*self.args, **self.keywords)
File "/home/myuser/.local/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 430, in open
conn = directed_sess.pool.get_connection()
File "/home/myuser/.local/lib/python3.10/site-packages/irods/pool.py", line 17, in method_
ret = method(self,*s,**kw)
File "/home/myuser/.local/lib/python3.10/site-packages/irods/pool.py", line 78, in get_connection
conn = Connection(self, self.account)
File "/home/myuser/.local/lib/python3.10/site-packages/irods/connection.py", line 62, in __init__
self._server_version = self._connect()
File "/home/myuser/.local/lib/python3.10/site-packages/irods/connection.py", line 308, in _connect
self.ssl_startup()
File "/home/myuser/.local/lib/python3.10/site-packages/irods/connection.py", line 210, in ssl_startup
wrapped_socket = context.wrap_socket(self.socket,
File "/usr/lib/python3.10/ssl.py", line 513, in wrap_socket
return self.sslsocket_class._create(
File "/usr/lib/python3.10/ssl.py", line 1100, in _create
self.do_handshake()
File "/usr/lib/python3.10/ssl.py", line 1371, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'local.hostname.com'. (_ssl.c:1007)
on desktopWSL:
myuser@desktopWSL:~$ ils
/myzone/home/myuser:
file40M
C- /myzone/home/myuser/random1
serverlog:
{"log_category":"agent","log_level":"error","log_message":"[-]\t/irods_source/server/core/src/rodsAgent.cpp:705:int runIrodsAgentFactory(sockaddr_un) : status [SSL_HANDSHAKE_ERROR] errno [] -- message [failed to call 'agent start']\n\t[-]\t/irods_source/lib/core/src/sockComm.cpp:160:irods::error sockAgentStart(irods::network_object_ptr) : status [SSL_HANDSHAKE_ERROR] errno [] -- message [failed to call 'agent start']\n\t\t[-]\t/irods_source/plugins/network/src/ssl.cpp:764:irods::error ssl_agent_start(irods::plugin_context &) : status [SSL_HANDSHAKE_ERROR] errno [] -- message [error calling SSL_accept | error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate]\n\n","server_host":"local","server_pid":2060000,"server_timestamp":"2024-09-12T06:17:37.453Z","server_type":"agent","server_zone":"myzone"}
{"log_category":"server","log_level":"critical","log_message":"Agent factory returned with error code [-2103000].","server_host":"local","server_pid":2060000,"server_timestamp":"2024-09-12T06:17:37.453Z","server_type":"agent","server_zone":"myzone"}
{"log_category":"agent_factory","log_level":"error","log_message":"Agent process [2060000] exited with status [1].","server_host":"local","server_pid":2059475,"server_timestamp":"2024-09-12T06:17:37.475Z","server_type":"agent_factory","server_zone":"myzone"}
desktopWSL irods_environment.json:
{
"irods_authentication_scheme": "pam_password",
"irods_client_server_negotiation": "request_server_negotiation",
"irods_client_server_policy": "CS_NEG_REQUIRE",
"irods_encryption_algorithm": "AES-256-CBC",
"irods_encryption_key_size": 32,
"irods_encryption_num_hash_rounds": 16,
"irods_encryption_salt_size": 8,
"irods_host": "public.hostname.com",
"irods_port": 1247,
"irods_ssl_verify_server": "hostname",
"irods_user_name": "myuser",
"irods_zone_name": "myzone"
}
putfilesimple.py:
#!/usr/bin/python3
import os
import ssl
import irods.keywords as kw
from irods.session import iRODSSession
import sys
filename = sys.argv[1]
print(filename)
try:
env_file = os.environ['IRODS_ENVIRONMENT_FILE']
except KeyError:
env_file = os.path.expanduser('~/.irods/irods_environment.json')
ssl_context = ssl.create_default_context(purpose=ssl.Purpose.SERVER_AUTH, cafile=None, capath=None,cadata=None)
ssl_settings = {"ssl_context": ssl_context,
'client_server_negotiation': 'request_server_negotiation',
'client_server_policy': 'CS_NEG_REQUIRE',
'encryption_algorithm': 'AES-256-CBC',
'encryption_key_size': 32,
'encryption_num_hash_rounds': 16,
'encryption_salt_size': 8}
def create_file_in_irods(irodssession, collname, filename, resource):
irodssession.data_objects.put(filename, "{}/{}".format(collname,filename),
**{kw.DEST_RESC_NAME_KW: resource, kw.VERIFY_CHKSUM_KW: ''})
with iRODSSession(irods_env_file=env_file,**ssl_settings) as session:
session.collections.create('/CICDtest/home/luijs002/random1')
create_file_in_irods(session,'/CICDtest/home/luijs002/random1',filename,'one')
local.hostname.com server_config.json:
{
"advanced_settings": {
"default_log_rotation_in_days": 5,
"default_number_of_transfer_threads": 4,
"default_temporary_password_lifetime_in_seconds": 120,
"delay_rule_executors": [],
"delay_server_sleep_time_in_seconds": 30,
"dns_cache": {
"eviction_age_in_seconds": 3600,
"shared_memory_size_in_bytes": 5000000
},
"hostname_cache": {
"eviction_age_in_seconds": 3600,
"shared_memory_size_in_bytes": 2500000
},
"maximum_size_for_single_buffer_in_megabytes": 32,
"maximum_size_of_delay_queue_in_bytes": 0,
"maximum_temporary_password_lifetime_in_seconds": 1000,
"number_of_concurrent_delay_rule_executors": 4,
"stacktrace_file_processor_sleep_time_in_seconds": 10,
"transfer_buffer_size_for_parallel_transfer_in_megabytes": 4,
"transfer_chunk_size_for_parallel_transfer_in_megabytes": 40
},
"catalog_provider_hosts": [
"public.hostname.com"
],
"catalog_service_role": "provider",
"client_api_allowlist_policy": "enforce",
"controlled_user_connection_list": {
"control_type": "denylist",
"users": []
},
"default_dir_mode": "0750",
"default_file_mode": "0600",
"default_hash_scheme": "SHA256",
"default_resource_name": "hot_1",
"environment_variables": {},
"federation": [],
"host_access_control": {
"access_entries": [
{several entries not mentioned here}
]
},
"host_resolution": {
"host_entries": [
{
"address_type": "local",
"addresses": [
"public.hostname.com",
"1.2.3.4"
]
}
]
},
"log_level": {
"agent": "info",
"agent_factory": "info",
"api": "info",
"authentication": "info",
"database": "info",
"delay_server": "info",
"legacy": "info",
"microservice": "info",
"network": "info",
"resource": "info",
"rule_engine": "info",
"s3_resource_plugin": "info",
"server": "info",
"sql": "info"
},
"match_hash_policy": "compatible",
"negotiation_key": "XXX",
"plugin_configuration": {
"authentication": {},
"database": {
"postgres": {
"db_host": "XXX",
"db_name": "XXX",
"db_odbc_driver": "PostgreSQL ANSI",
"db_password": "XXX",
"db_port": 5432,
"db_username": "XXX"
}
},
"network": {},
"resource": {},
"rule_engines": [
{
"instance_name": "irods_rule_engine_plugin-python-instance",
"plugin_name": "irods_rule_engine_plugin-python",
"plugin_specific_configuration": {}
},
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
"plugin_specific_configuration": {
"re_data_variable_mapping_set": [
"core"
],
"re_function_name_mapping_set": [
"core"
],
"re_rulebase_set": [
"core"
],
"regexes_for_supported_peps": [
"ac[^ ]*",
"msi[^ ]*",
"[^ ]*pep_[^ ]*_(pre|post|except|finally)"
]
},
"shared_memory_instance": "irods_rule_language_rule_engine"
},
{
"instance_name": "irods_rule_engine_plugin-cpp_default_policy-instance",
"plugin_name": "irods_rule_engine_plugin-cpp_default_policy",
"plugin_specific_configuration": {}
}
]
},
"rule_engine_namespaces": [
""
],
"schema_name": "server_config",
"schema_validation_base_uri": "file:///var/lib/irods/configuration_schemas",
"schema_version": "v4",
"server_control_plane_encryption_algorithm": "AES-256-CBC",
"server_control_plane_encryption_num_hash_rounds": 16,
"server_control_plane_key": "XXX",
"server_control_plane_port": 1248,
"server_control_plane_timeout_milliseconds": 10000,
"server_port_range_end": 20199,
"server_port_range_start": 20000,
"xmsg_port": 1279,
"zone_auth_scheme": "native",
"zone_key": "XXX",
"zone_name": "myzone",
"zone_port": 1247,
"zone_user": "XXX"
}
local.hostname.com /etc/hosts:
127.0.1.1 localhost localhost.localdomain
::1 localhost6.localdomain6 localhost6
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
1.2.3.4 public.hostname.com local.hostname.com local
127.0.0.1 dav.localhost
local.hostname.com hostname command:
local
Certificate installed on the irodshost is valid only for public.hostname.com
, not for local.hostname.com
or local
Thank you for all of that.
Just to narrow it a bit more...
Please try putfilesimple.py
after setting the following in desktopWSL irods_environment.json
:
"irods_ssl_verify_server": "cert",
We're suspecting PRC is creating/using the ssl_context slightly differently than the iCommands... but it's not clear yet what that difference is.
weirdly enough that does not change anything, it still gives me the certificate error for local.hostname.com
I am not sure the certificate handling is the issue though, if I install a different certificate that works for both local.hostname.com
and public.hostname.com
It works normally.
However, if I then disconnect from the VPN, and thus am not able to connect to local.hostname.com
anymore, I just get
irods.exception.NetworkException: Could not connect to specified host and port: local.hostname.com:1247
.
Okay... So that suggests that your client machine is trying to make a direct connection to that private machine local.hostname.com
.
Does iput
also fail without the VPN?
And if so... does it work again if you use iput -N0
to disable the high ports and redirection?
Found it!!
iput succeeds without VPN.
I also did some debugging myself in the PRCcode. The initial connect seems to go fine, only during the put itself things go wrong. Below you see a whole lot of connection stuff, then the collection manager, and then again connection.
Added by me in /home/myuser/.local/lib/python3.10/site-packages/irods/manager/collection_manager.py:: path: /myzone/home/myuser/random1/file40M
Added by me: /home/myuser/.local/lib/python3.10/site-packages/irods/connection.py: address: ('public.hostname.com', 1247)
Added by me: /home/myuser/.local/lib/python3.10/site-packages/irods/connection.py, host: public.hostname.com
Added by me: /home/myuser/.local/lib/python3.10/site-packages/irods/connection.py: self.account.host public.hostname.com
Added by me: /home/myuser/.local/lib/python3.10/site-packages/irods/connection.py: self.account.host public.hostname.com
Added by me in /home/myuser/.local/lib/python3.10/site-packages/irods/manager/collection_manager.py:: path: /myzone/home/myuser/random1
Added by me: /home/myuser/.local/lib/python3.10/site-packages/irods/connection.py: address: ('local.hostname.com', 1247)
In the error stack I noticed this:
File "/home/myuser/.local/lib/python3.10/site-packages/irods/manager/data_object_manager.py", line 430, in open
conn = directed_sess.pool.get_connection()
Lines 424-431 of data_object_manager.py show that some redirection took place:
if redirected_host and use_get_rescinfo_apis:
# Redirect only if the local zone is being targeted, and if the hostname is changed from the original.
if target_zone == self.sess.zone and (self.sess.host != redirected_host):
# This is the actual redirect.
directed_sess = self.sess.clone(host = redirected_host)
returned_values['session'] = directed_sess
conn = directed_sess.pool.get_connection()
logger.debug('redirect_to_host = %s', redirected_host)
So I then looked for where redirect_host was set and I saw this:
if allow_redirect and conn.server_version >= (4,3,1):
key = 'CREATE' if mode[0] in ('w','a') else 'OPEN'
message = iRODSMessage('RODS_API_REQ',
msg=make_FileOpenRequest(**{kw.GET_RESOURCE_INFO_OP_TYPE_KW:key}),
int_info=api_number['GET_RESOURCE_INFO_FOR_OPERATION_AN'])
conn.send(message)
response = conn.recv()
msg = response.get_main_message( STR_PI )
use_get_rescinfo_apis = True
# Get the information needed for the redirect
_ = json.loads(msg.myStr)
redirected_host = _["host"]
requested_hierarchy = _["resource_hierarchy"]
I then realised that the definition of the resource I was writing to still had local.hostname.com
in the host. I changed that to public.hostname.com
, and then the put finally worked.
As my resource definitions were not updated after moving to a different dns, I guess this might not be a bug, or maybe it is. Or it might be a bug in icommands, since it did not give errors there.
Oh - amazing. So the catalog actually still held the local.hostname.com
name...
Yes, now I'm wondering how iput
was succeeding.
I think we can close this alongside #627. Thoughts?
I think I agree given the fact that this issue involves a load balancer.
Agreed. Closing.
Will mark #627 as duplicate as well.
I am testing a setup where I have a loadbalancer in front of an irods instance. irods runs on host
hostname.localdomain.com
, on version 4.3.1 I run PRC version 2.0.1, on python 3.8.10 On the loadbalancer we haveirods.publicdomain.com
, which forwards tohostname.localdomain.com
The irods server has a certificate that only hasirods.publicdomain.com
.If I now connect via the PRC to
irods.publicdomain.com
, I get the following error:Trying it on a different machine I get:
The pythoncode is:
Somehow PRC seems to be getting the
hostname.localdomain.com
from irods, and using that to connect instead of theirods.publicdomain.com
that I put in myirods_environment.json
. I think it should only connect toirods.publicdomain.com
instead.There might be some setting in irods itself that I need to change, but with icommands my setup works as designed, with the PRC it does not.