Closed peterverraedt closed 1 year ago
I remember running into weird things when implementing this. I had to create a new filesystem instance and I never figured out why.
Perhaps your changes resolve that?
If you haven't already, please create a PR for this change and we'll look into testing it against Postgres and MySQL.
Also, which search fields did you use to trigger this error?
Have you seen this error on 4.2.11 or 4.3.0 (now that they are available)?
@peterverraedt Can you provide input that triggers this?
I have not been able to reproduce this using iRODS 4.2.10 and MySQL 5.7 or PostgreSQL.
I can still reproduce it with the jargon release 4.3.2.5-RELEASE and metalnx version 2.6.0.
Search field: e.g. owner name only.
The patch based above still works, but I still don't exactly know why, that is why it didn't make into a PR.
So even after bumping the jargon version and recompiling Metalnx, the problem still exists.
I'm going to give it a try and see what happens.
I just tried this against iRODS 4.2.10 and MySQL 5.7. I can't trigger the issue.
Thinking about this more, I feel this is more of a jargon issue. Perhaps I can trigger it outside of Metalnx.
In the meantime, if you can provide any additional information about your setup, that would be helpful.
I could not reproduce the issue outside of metalnx with jargon 4.3.2.4-RELEASE either.
Have you reviewed the log files for the MySQL database and the iRODS server when this happens? Do they contain anything of interest?
I checked the logfiles and was able to spot the real issue (or get closer to it). I'm sure I checked those logs earlier but must have missed it then, or logging did improve in the latest release.
The metalnx log metalnx.log shows
[2022-09-09T10:18:41.750Z][icts-t-cloud-hev-1] 2022-09-09 10:18:41 ERROR ClientServerNegotiationService:128 - failure in client server negotiation!...sending error message to the server before throwing the failure exception
[2022-09-09T10:18:41.751Z][icts-t-cloud-hev-1] 2022-09-09 10:18:41 ERROR AbstractConnection:644 - read length is set to zero
[2022-09-09T10:18:41.752Z][icts-t-cloud-hev-1] at org.irods.jargon.core.pub.IRODSGenericAO.<init>(IRODSGenericAO.java:61)
during the second time
IRODSFileSystem fsys = IRODSFileSystem.instance();
IRODSAccessObjectFactory factory = fsys.getIRODSAccessObjectFactory();
IRODSGenQueryExecutor executor = factory.getIRODSGenQueryExecutor(_input.account);
are called.
We run a custom proxy before irods, that basically inspect the first couple of messages to determine to which backend the connection should be relayed (checking zone and SNI hostname). It shows in proxy.log two connections from metalnx towards irods port 1247:
The first connection is handled well, the second sees cs_neg_result_kw=CS_NEG_FAILURE;
from the client at which point the connection is broken.
The proxy always sends
message.IRODSMessageCSNegotiation{
Status: 1,
Result: "CS_NEG_REQUIRE",
}
towards the client. Can you check whether the problem can be reproduced if irods itself always returns CS_NEG_REQUIRE?
Before I try to reproduce this, I'd like to know if you've tried the following.
Q. If you removed the proxy, does the behavior change? Q. If you disabled SSL, does the behavior change?
Seems the only differences between our deployments is SSL and a proxy.
A: Yes, without the proxy it works as intended. A: I cannot disable SSL because my proxy relies on it, but I discovered the following.
With my proxy, the behaviour of a connection is as follows (and in second connection, the third message below is a failure from the client):
C>S:
<MsgHeader_PI>
<type>RODS_CONNECT</type>
<msgLen>360</msgLen>
<errorLen>0</errorLen>
<bsLen>0</bsLen>
<intInfo>0</intInfo>
</MsgHeader_PI>
<StartupPack_PI>
<irodsProt>0</irodsProt>
<reconnFlag>0</reconnFlag>
<connectCnt>0</connectCnt>
<proxyUser>rods</proxyUser>
<proxyRcatZone>zone</proxyRcatZone>
<clientUser>rods</clientUser>
<clientRcatZone>zone</clientRcatZone>
<relVersion>rods4.3.0</relVersion>
<apiVersion>d</apiVersion>
<option>request_server_negotiation</option>
</StartupPack_PI>
S>C:
<MsgHeader_PI>
<type>RODS_CS_NEG_T</type>
<msgLen>78</msgLen>
<errorLen>0</errorLen>
<bsLen>0</bsLen>
<intInfo>0</intInfo>
</MsgHeader_PI>
<CS_NEG_PI>
<status>1</status>
<result>CS_NEG_REQUIRE</result>
</CS_NEG_PI>
C>S:
<MsgHeader_PI>
<type>RODS_CS_NEG_T</type>
<msgLen>141</msgLen>
<errorLen>0</errorLen>
<bsLen>0</bsLen>
<intInfo>0</intInfo>
</MsgHeader_PI>
<CS_NEG_PI>
<status>1</status>
<result>cs_neg_sid_kw=someid;cs_neg_result_kw=CS_NEG_USE_SSL;</result>
</CS_NEG_PI>
If I modify my proxy to not send CS_NEG_REQUIRE but CS_NEG_DONT_CARE to the client, and just hope that the client will enable use of SSL (because I need it in my proxy to do SNI), it works well.
I was unable to test this without the proxy, because my irods server seems to send CS_NEG_DONT_CARE even if I require the use of ssl in /var/lib/irods/.irods/irods_environment.json.
I am now wondering whether this is a bug or expected behaviour.
my irods server seems to send
CS_NEG_DONT_CARE
the server setting is in core.re
, please confirm it is what you want it to be...
Ok, thanks, forgot to look there.
I can now confirm that the bug is also present without a proxy if CS_NEG_REQUIRE
is set in core.re
. That should allow for easy reproduction.
So the error appears when there is no proxy, but SSL is required by the server.
You mentioned in this comment, https://github.com/irods-contrib/metalnx-web/issues/294#issuecomment-1243892373, that things worked as intended when the proxy was removed. What was the state of core.re/acPreConnect
during that run?
Then core.re defaulted to CS_NEG_DONT_CARE
. From tcpdump/ssldump it is seen that also in this case, metalnx connects using ssl in both connections to irods.
Got it. Something isn't quite right when the server is configured to require SSL.
We'll look into it.
I feel like I have reproduced this error but my stack trace is not identical to the one above.
I enabled SSL. First did a search on data object name. All went well. The correct entries were returned and no errors appeard in the log.
I then did a search on owner name (which should return both data objects and collections) and the server responded with a 500 error. This is the top of the stack trace I got:
2023-01-24 15:55:32 ERROR FilePropertiesController:151 - Could not search by metadata:
org.irods.jargon.core.exception.ClientServerNegotiationException: failure in client server negotiation
at org.irods.jargon.core.connection.ClientServerNegotiationService.negotiateUsingServerProtocol(ClientServerNegotiationService.java:131)
at org.irods.jargon.core.connection.ClientServerNegotiationService.negotiate(ClientServerNegotiationService.java:106)
at org.irods.jargon.core.connection.AuthMechanism.clientServerNegotiation(AuthMechanism.java:158)
The proxy might change the specific error.
I haven't tracked down the root cause of this but I have confirmed a couple of things.
This leads me to believe that when you close the sessions and then open them back up, something dealing with SSL is not being set up properly.
We run iRODS 4.2.10 with a mysql database and Metalnx 2.5.1. The new unified search
This was not tested against postgresql, I cannot exclude the possibility that it is database-related.
Digging into the code shows that if in https://github.com/irods-contrib/metalnx-web/blob/master/src/metalnx-web/src/main/java/com/emc/metalnx/controller/utils/GenQuerySearchUtil.java#L123-L137 the if is forced to be skipped (
if (false) { }
) the searchs works properly (but does not return collections, of course). Digging further shows that the error is triggered at the second invocation of https://github.com/irods-contrib/metalnx-web/blob/master/src/metalnx-web/src/main/java/com/emc/metalnx/controller/utils/GenQuerySearchUtil.java#L253-L254 or https://github.com/irods-contrib/metalnx-web/blob/master/src/metalnx-web/src/main/java/com/emc/metalnx/controller/utils/GenQuerySearchUtil.java#L203-L204. Probably the session is destroyed a bit too much and cannot be recovered.The following patch fixes search, but I cannot say I understand why, it just circumvents the problem.