scandihealth / lpr3-docs

https://scandihealth.github.io/lpr3-docs/
MIT License
11 stars 7 forks source link

Fingerprints are different on load balancing #291

Closed ThorkildFriis closed 5 years ago

ThorkildFriis commented 5 years ago

It seems that load balancing provides different fingerprints for each certificate. Either you have to accept each fingerprint or to examine the content of the certificate...

TueCN commented 5 years ago

Hello and welcome to the project. I am not sure I understand the exact nature of your issue. Could you please elaborate?

We have several reporters that have no problems getting their requests accepted by the service on https://lprws.sds.dsdn.dk which is a load balanced environment. What exact issue are you experiencing? Is it related to #288 and if so, please elaborate.

Also please state your affiliation with the project as described in our contribution guidelines

ThorkildFriis commented 5 years ago

Thanks, On the live server we sometimes have issues with finger prints from the certificate. We have not seen the issue in test and most of the time it works on the live server. Our service does not recognize the certificate and, hence, it can not trust it. I can not know for sure whether it is based on load balancing. But it is my guess. It works most of the time and it always works in the test environment.

TueCN commented 5 years ago

Your service does not recognize what certificate? LPR does not send any certificates anywhere.

Do you mean LPR does not recognize the STS Prod certificate? If so it is probably a duplicate of #288.

Tomorrow we will deploy our load-balancing middleware on TEST (even though there is only 1 back-end server) to make it as similar to PROD as possible. Please verify everything is working as expected after next release.

ThorkildFriis commented 5 years ago

I am quite certain that this is not a duplicate of #288. #288 is about an issue whith MTOM where they are not able to send at all. Please have a look at the snippet from our error log. From our message we can see that only one of a total of 13 messages within the same second failed.

24-01-2019 16:25:55 null null SEVERE: Truststore contains: 24-01-2019 16:25:55 null null SEVERE: Truststore contains: 24-01-2019 16:25:55 null null SEVERE: CN=nsp.rsyd.net, OU=Domain Control Validated 24-01-2019 16:25:55 null null SEVERE: CN=nsp.rsyd.net, OU=Domain Control Validated 24-01-2019 16:25:55 null null SEVERE: CN=.sds.dsdn.dk, OU=Domain Control Validated 24-01-2019 16:25:55 null null SEVERE: CN=.sds.dsdn.dk, OU=Domain Control Validated 24-01-2019 16:25:55 null null SEVERE: Fejl ved kald til NSP service: https://lprws.sds.dsdn.dk/cda-ws/DocumentRepository_Service/PatientHealthcareReportingService (java.security.cert.CertificateException: Trust store doesn't contain a certificate match for CN=CN=.sds.dsdn.dk, OU=Domain Control Validated) (fejlkode: 66) 24-01-2019 16:25:55 null null SEVERE: Fejl ved kald til NSP service: https://lprws.sds.dsdn.dk/cda-ws/DocumentRepository_Service/PatientHealthcareReportingService (java.security.cert.CertificateException: Trust store doesn't contain a certificate match for CN=CN=.sds.dsdn.dk, OU=Domain Control Validated) (fejlkode: 66)

TueCN commented 5 years ago

Ok, I assume you are talking about the fingerprint of the SSL certificate sent by https://lpradm.sds.dsdn.dk ? I first assumed you were saying that LPR did not trust your XML signature of the SOAP header.

I don't fully understand what the log you pasted is for. Is it a log of the entries in your trust store? If so, it seems there are 2 entries for .sds.dsdn.dk. Could that be the issue?

We just tested by sending 1000 requests to https://lpradm.sds.dsdn.dk and all of them returned

SHA1 Fingerprint=32:DA:7F:86:65:AB:78:1C:EF:B6:92:EE:31:34:BF:74:08:DE:6B:83
SHA256 Fingerprint=BA:93:8F:74:76:6B:F2:31:83:0A:56:AB:6F:B5:AF:83:CF:52:30:90:05:33:DD:D6:28:E6:BB:D0:B2:F2:A9:BE

which is as expected. What fingerprint(s) did you receive?

We have not heard of anyone else having issues verifying our SSL certificate. Are you sure the error is not on your side?

ThorkildFriis commented 5 years ago

Today we saw the same issue when submitting 4500 and later 54000 records in test. This is the first time we have seen it in test. So it is perhaps reasonable to assume that the problem can not be reproduced after you successfully submitted 1000 posts. But we did not see that issue before submitting more than 600,000 in ten threads on the live server and before in test before that we never submitted that many records and we only did it from one thread. We did today see it with as little as 4500 records from ten threads... And it is not like only one thread is failing. I am not a programmer (used to be though) but it does not make sense to me that our algorithm should (the same piece of code) fail when submitting many records from several threads and not fail when submitting a few records from one thread unless there is an issue with locking and race condition. We will, of course, check the race condition option, but I really do not believe that this is the issue. The guy who programmed it is too experienced to make a mistake like that, unless there is a bug in Java that is only present if the word LPR3 is in the name of the project.... I mean, it is the same algorithm that we use for all other communication. We will check, of course. But I suggest that you handle this issue as not "cannot reproduce". Perhaps try with 50,000 or even better 600,000 posts from 10 threads. We have not seen the issue with 1000 records in one thread either.

jonigkeit commented 5 years ago

We will perform a test tomorrow.

Since you experience the issue regularly will you please provide us with a TCP dump of the offending TLS handshake.

jonigkeit commented 5 years ago

We have completed our test of 12 concurrent calls repeating 50,000 times for a total of 600,000 invocations and every single time the fingerprint remains the same

SHA256 Fingerprint=BA:93:8F:74:76:6B:F2:31:83:0A:56:AB:6F:B5:AF:83:CF:52:30:90:05:33:DD:D6:28:E6:BB:D0:B2:F2:A9:BE

@ThorkildFriis since you experience the issue regularly will you please provide us with a TCP dump of the offending TLS handshake.

ThorkildFriis commented 5 years ago

We tried to add more log and send 100.000 messages. After about 10.000 messages we stopped and examined the log. The proxy we use is written in Java v.6 because of the version of the Websphere Application Server we use. It turns out that the Java library method to check certificates is not thread safe. We added a Synchronize() to the method call and the remainder 90.000 messages went though without any issues.