Closed RasmusThernoe closed 5 years ago
@RasmusThernoe I don't know if this is your request, but I just saw the following exception in our logs:
2019-01-24 13:52:16,932 WARNING [...] ID kortet har timeout.
This fault would be because the IssueInstant + TimeOut < now
We have not send any requests since around 12:30.
That sounds like the Capitol Region - who have had problems with timeout of the ID card.
Any news?
We need to get reports to LPR3 today as we need the accumulated error list tonight.
I'm not sure the cause of the error message "Unable to get certificate from dom" is the same as in #48. Our signature validation in production is exactly the same as on https://lprws-test.sds.dsdn.dk
Can you get one of your requests signed by both STS test and STS prod and compare the two outbound messages to see if there are any glaring differences? Is your formatting/processing of the signed requests identical for prod and test?
Maybe this clarification stacktrace from our log can help you identify the issue?
dk.sosi.seal.model.ModelException: Unable to get certificate from dom
at dk.sosi.seal.model.SignatureUtil.resolveCertificate(SignatureUtil.java:695)
...
Caused by: org.apache.xml.security.keys.keyresolver.KeyResolverException: Could not parse certificate: java.io.IOException: Invalid BER/DER data (too huge?)
Original Exception was java.security.cert.CertificateException: Could not parse certificate: java.io.IOException: Invalid BER/DER data (too huge?)
at org.apache.xml.security.keys.keyresolver.implementations.X509CertificateResolver.engineLookupResolveX509Certificate(X509CertificateResolver.java:110)
at org.apache.xml.security.keys.KeyInfo.applyCurrentResolver(KeyInfo.java:870)
at org.apache.xml.security.keys.KeyInfo.getX509CertificateFromStaticResolvers(KeyInfo.java:853)
at org.apache.xml.security.keys.KeyInfo.getX509Certificate(KeyInfo.java:819)
at dk.sosi.seal.model.SignatureUtil.resolveCertificate(SignatureUtil.java:693)
We will compare the two outbound messages for glaring differences.
We can provide the input messages if you have a secure channel?
The input messages contains both patient data and production certificate.
I guess the error comes from parsing the X509Certificate part of the message. Well, I have tried to get the message we are trying to send (by using ncat) and taking the certificate part of this output and it seems openssl can parse it without any problems as DER formatted.
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 1457554774 (0x56e08556)
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=DK, O=TRUST2408, CN=TRUST2408 OCES CA II
Validity
Not Before: May 12 08:55:19 2016 GMT
Not After : May 12 08:55:02 2019 GMT
Subject: C=DK, O=Sundhedsdatastyrelsen // CVR:33257872/serialNumber=CVR:33257872-FID:55008930, CN=SOSI Federation 2 (funktionscertifikat)
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
RSA Public Key: (2048 bit)
Modulus (2048 bit):
00:90:51:f5:fe:23:2d:bf:8d:e7:d3:05:ac:37:82:
ab:e9:fd:f2:34:b6:a0:90:64:38:ec:f1:fa:46:b3:
58:08:67:08:5a:ff:45:78:53:d5:54:79:c9:76:fc:
3d:41:db:65:c6:3a:76:09:67:05:9c:b4:de:f3:92:
4f:0a:44:6d:bc:07:6e:33:d5:0d:3f:7e:b9:ce:06:
d1:2b:5e:93:25:5b:d2:02:24:8f:ed:b1:da:eb:59:
da:eb:7a:1e:34:5f:d8:2b:68:af:8a:d0:4a:d8:b3:
80:96:6d:db:63:03:34:83:1c:55:09:56:ff:ca:63:
92:25:86:ed:bc:f2:91:4b:d9:e4:77:2a:e5:1b:ef:
62:15:13:41:a9:eb:22:cb:a6:f0:87:19:44:1e:19:
bf:96:93:2f:a0:c6:00:f1:11:3c:ed:d4:b8:14:e8:
97:b0:76:c1:41:c8:3f:bc:28:c0:d2:04:e7:2f:85:
57:11:11:1a:df:bc:36:56:5f:77:84:56:e4:fd:35:
c5:9e:0c:5f:6c:34:25:b0:a6:10:4a:f3:07:d5:f8:
c5:a6:44:71:60:1f:a6:50:ba:69:a6:7d:8b:6e:98:
f5:c5:a9:41:59:8a:16:a8:d0:72:86:fc:28:61:d7:
4d:58:05:c8:4a:0a:5e:90:b7:e2:30:82:69:f9:b5:
7d:7b
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment, Data Encipherment, Key Agreement
Authority Information Access:
OCSP - URI:http://ocsp.ica02.trust2408.com/responder
CA Issuers - URI:http://f.aia.ica02.trust2408.com/oces-issuing02-ca.cer
X509v3 Certificate Policies:
Policy: 1.2.208.169.1.1.1.4.2
CPS: http://www.trust2408.com/repository
User Notice:
Organization: TRUST2408
Number: 1
Explicit Text: For anvendelse af certifikatet g▒lder OCES vilk▒r, CPS og OCES CP, der kan hentes fra www.trust2408.com/repository. Bem▒rk, at TRUST2408 efter vilk▒rene har et begr▒
nset ansvar ift. professionelle parter.
X509v3 CRL Distribution Points:
URI:http://crl.ica02.trust2408.com/ica02.crl
DirName:/C=DK/O=TRUST2408/CN=TRUST2408 OCES CA II/CN=CRL4079
X509v3 Authority Key Identifier:
keyid:99:8F:BA:0D:89:AE:21:1A:42:7A:0A:AE:1A:4C:4E:22:FF:10:EB:8C
X509v3 Subject Key Identifier:
82:21:F4:6B:02:19:DE:7F:61:03:4F:DC:30:9C:24:CC:A8:19:A3:D2
X509v3 Basic Constraints:
CA:FALSE
Signature Algorithm: sha256WithRSAEncryption
7c:6a:bc:17:2b:14:42:e7:73:26:63:e9:86:6f:a6:9c:e4:e0:
df:fa:48:06:60:e3:b9:d9:38:c6:88:a3:81:5f:03:dc:7c:17:
ff:2a:79:86:49:60:74:dd:77:e7:bd:c2:23:c4:01:f6:ee:21:
d9:84:aa:ad:0d:2d:59:4b:67:86:6d:8e:36:82:c9:04:ca:5c:
f4:d2:ca:44:84:e0:a5:21:c3:6e:2b:c8:d9:5e:9d:dd:38:9b:
9b:c8:bb:39:5e:94:82:2b:02:a0:03:15:38:b8:5a:31:9c:ba:
30:04:3f:b3:da:4f:9d:df:6b:de:b0:49:46:2c:9b:a2:49:a7:
c1:a2:3a:e3:28:08:62:66:a0:90:ce:de:f1:b9:7e:50:8f:22:
46:b2:e5:7c:2b:63:d6:75:74:ee:a3:35:75:60:aa:19:54:02:
4e:5a:4b:2b:89:aa:3b:56:35:62:7a:17:4e:61:fd:7d:e0:a2:
d5:43:ab:dc:d1:6c:0e:4a:2e:54:7a:dc:15:ad:ab:63:3d:0e:
44:e4:93:99:f4:24:13:dd:00:f5:ff:d3:a8:70:31:e9:f4:4c:
ed:b0:fc:05:53:17:21:e2:88:44:da:73:39:69:5f:03:e3:71:
1c:be:2c:55:70:f0:5f:8b:3a:9a:c6:33:d1:84:68:e1:de:b5:
e1:11:97:56
I would really like to send out TCP dump but it contains data which should not be posted on a public github. Do you have any secure channel where we can send the dump?
Hmm we could enable "raw" logging of all requests to the service for a brief period to catch your request and see what it looks like when we receive it, but I don't know if that would violate GDPR or other rules and regulations.
Your requests must differ somehow from the requests of the other reporters, since they do not have the same problem (not that your request necessarily isn't compliant).
I am afraid I can't provide much assistance on debugging this issue in prod tonight. I could work on trying to give you access to our internal load balanced test environment in DXC (should be reachable from Sundhedsdatanettet) if you would like to try getting a request accepted there (must be signed by STS test). There we can also enable logging without punity and you can send some scrambled/fake data.
We suggest you create a submission but with test patient data. You sign the request as you do in production, i.e. with production STS. Instead of sending it to the production endpoint of lpr you send it to test, i.e. https://lprws-test.sds.dsdn.dk/cda-ws/DocumentRepository_Service/PatientHealthcareValidateReportingService
If you would like to transfer one of your prod signed requests that are rejected, you can upload it to LPR's FTP server (which is secured).
We have created a new folder named "gh288" in the "LPRRM" users root folder, which the user should have write access to. Is this an acceptable secure channel?
We have now sent the same message (extracted from production) to the test-service (PatientHealthcareReportingService) by using SoapUI against. So the message are signed and contains the production STSIDCARD for production but are sent to the test service.
The timestamp for this attempt is: Fri, 25 Jan 2019 10:11:32 GMT (so 11:11:32 in Denmark)
I got the following response from the service:
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<env:Header xmlns:env="http://www.w3.org/2003/05/soap-envelope"/>
<soap:Body>
<soap:Fault>
<soap:Code>
<soap:Value>soap:Receiver</soap:Value>
</soap:Code>
<soap:Reason>
<soap:Text xml:lang="en">The certificate that signed the security token is not trusted!</soap:Text>
</soap:Reason>
</soap:Fault>
</soap:Body>
</soap:Envelope>
So it seems like the test service understands the certificate part of the message. But I cannot leave out the possibility that SoapUI are doing something with how we send the data which makes a difference.
I am right now trying to send the raw message from production to test endpoint by using a combination of stunnel and ncat. But this are going to take some time since I have some technical problems with my setup.
@RasmusThernoe and @JacobBangSSE We would very much appreciate if you uploaded your request to the aforementioned folder on the SFTP server, so that we can work concurrently on resolving this issue.
I have uploaded the NCAT dump to the folder on the SFTP server.
We have started logging requests in production. Please try to submit another request
Now I have submitted a new request
We have reproduced the "cannot get certificate from dom" error from the request uploaded to FTP. Sending the exact same request to https://lprws-test.sds.dsdn.dk also yields the exact same error response.
We are in the process of verifying the test method by seeing that one of our own generated requests does not yield "cannot get certificate from dom" error when sending in the same way.
However, we also just saw a new type of error in our logs we haven't seen before at all:
2019-01-25 12:41:13,154 WARNING [...]: Couldn't find MIME boundary: --_NextPart_000_0002_01C3E1CC.3BB37320
Is this from one of your requests or someone else?
Mine requests was from 12:42:xx I don't think the error is from mine requests
There were many warnings of this type in the time between 12:41-12:44
2019-01-25 12:44:27,674 WARNING [...]: Couldn't find MIME boundary: --_NextPart_000_0002_01C3E1CC.3BB37320
You should probably see a similar error in your responses? We are not actively pursuing this new warning now, it was just to let you know that there is a new type of error we haven't seen before.
@JacobBangSSE @RasmusThernoe we have now verified our test method. We do not get "Unable to get certificate from dom" when sending one of our own generated requests in the same way (via curl).
This implies that the requests you generate for prod must be different in some way to the requests you generate for test (https://lprws-test.sds.dsdn.dk) as you normally have no problem getting requests accepted on test.
We are in the process of investigating the differences between our and your request to see what could cause the issue.
Can you attach here a dump of one of your "test" requests that you normally send to https://lprws-test.sds.dsdn.dk and which does not give the "Unable to get certificate from dom" error?
We will perform that task.
Since we have real patient data in our "outbox" we would like to use the endpoints that validate without persisting.
First we will send our request to the test endpoint using a test certificate to:
Then we will send our request to the production endpoint using a production certificate to:
Will this work to get the right compare?
Do you have logging applied to both endpoints?
When inline the xop included certificate we get the following invalid PEM
MIIGHjCCBQagAwIBAgIEVuCFVjAKBgkqhkiG9woBAQsFADBAMQswCQYDVQQGEwJE
SzESMBAGA1UECgwJVFJVU1QyNDA4MR0wGwYDVQQDDBRUUlVTVDI0MDggT0NFUyBD
QSBJSTAeFwoxNjA1MTIwODU1MTlaFwoxOTA1MTIwODU1MDJaMIGRMQswCQYDVQQG
EwJESzEuMCwGA1UECgwlU3VuZGhlZHNkYXRhc3R5cmVsc2VuIC8vIENWUjozMzI1
Nzg3MjFSMCAGA1UEBRMZQ1ZSOjMzMjU3ODcyLUZJRDo1NTAwODkzMDAuBgNVBAMM
J1NPU0kgRmVkZXJhdGlvbiAyIChmdW5rdGlvbnNjZXJ0aWZpa2F0KTCCASIwCgYJ
KoZIhvcKAQEBBQADggEPADCCAQoCggEBAJBR9f4jLb+N59MFrDeCq+n98jS2oJBk
OOzx+kazWAhnCFr/RXhT1VR5yXb8PUHbZcY6dglnBZy03vOSTwpEbbwHbjPVCj9+
uc4G0StekyVb0gIkj+2x2utZ2ut6HjRf2Ctor4rQStizgJZt22MDNIMcVQlW/8pj
kiWG7bzykUvZ5Hcq5RvvYhUTQanrIsum8IcZRB4Zv5aTL6DGAPERPO3UuBTol7B2
wUHIP7wowNIE5y+FVxERGt+8NlZfd4RW5P01xZ4MX2w0JbCmEErzB9X4xaZEcWAf
plC6aaZ9i26Y9cWpQVmKFqjQcob8KGHXTVgFyEoKXpC34jCCafm1fXsCAwEAAaOC
AswwggLIMA4GA1UdDwEB/wQEAwIDuDCBiQYIKwYBBQUHAQEEfTB7MDUGCCsGAQUF
BzABhilodHRwOi8vb2NzcC5pY2EwMi50cnVzdDI0MDguY29tL3Jlc3BvbmRlcjBC
BggrBgEFBQcwAoY2aHR0cDovL2YuYWlhLmljYTAyLnRydXN0MjQwOC5jb20vb2Nl
cy1pc3N1aW5nMDItY2EuY2VyMIIBQwYDVR0gBIIBOjCCATYwggEyBgoqgVCBKQEB
AQQCMIIBIjAvBggrBgEFBQcCARYjaHR0cDovL3d3dy50cnVzdDI0MDguY29tL3Jl
cG9zaXRvcnkwge4GCCsGAQUFBwICMIHhMBAWCVRSVVNUMjQwODADAgEBGoHMRm9y
IGFudmVuZGVsc2UgYWYgY2VydGlmaWthdGV0IGfmbGRlciBPQ0VTIHZpbGvlciwg
Q1BTIG9nIE9DRVMgQ1AsIGRlciBrYW4gaGVudGVzIGZyYSB3d3cudHJ1c3QyNDA4
LmNvbS9yZXBvc2l0b3J5LiBCZW3mcmssIGF0IFRSVVNUMjQwOCBlZnRlciB2aWxr
5XJlbmUgaGFyIGV0IGJlZ3LmbnNldCBhbnN2YXIgaWZ0LiBwcm9mZXNzaW9uZWxs
ZSBwYXJ0ZXIuMIGXBgNVHR8EgY8wgYwwLqAsoCqGKGh0dHA6Ly9jcmwuaWNhMDIu
dHJ1c3QyNDA4LmNvbS9pY2EwMi5jcmwwWqBYoFakVDBSMQswCQYDVQQGEwJESzES
MBAGA1UECgwJVFJVU1QyNDA4MR0wGwYDVQQDDBRUUlVTVDI0MDggT0NFUyBDQSBJ
STEQMA4GA1UEAwwHQ1JMNDA3OTAfBgNVHSMEGDAWgBSZj7oKia4hGkJ6Cq4aTE4i
/xDrjDAdBgNVHQ4EFgQUgiH0awIZ3n9hA0/cMJwkzKgZo9IwCQYDVR0TBAIwADAK
BgkqhkiG9woBAQsFAAOCAQEAfGq8FysUQudzJmPphm+mnOTg3/pIBmDjudk4xoij
gV8D3HwX/yp5hklgdN13573CI8QB9u4h2YSqrQotWUtnhm2ONoLJBMpc9NLKRITg
pSHDbivI2V6d3Tibm8i7OV6UgisCoAMVOLhaMZy6MAQ/s9pPnd9r3rBJRiybokmn
waI64ygIYmagkM7e8bl+UI8iRrLlfCtj1nV07qM1dWCqGVQCTlpLK4mqO1Y1YnoX
TmH9feCi1UOr3NFsDkouVHrcFa2rYz0OROSTmfQkE90A9f/TqHAx6fRM7bD8BVMX
IeKIRNpzOWlfA+NxHL4sVXDwX4s6msYz0YRo4d614RGXVgo=
Either we decode wrong, you encode wrong or the input is invalid.
You have encoded it wrong. If you remove all newlines, and decode the BASE64 you can see all "newlines" in the files ends with CRLF bits. This problem are introduced when you copy binary data in Windows where you have mixed types of line endings. Windows thinks it should try replace all the inconsistent line endings and add CRLF to the end.
I found that problem yesterday and used several hours of debugging before I open a HEX editor and found out the difference.
Done this:
Then we will send our request to the production endpoint using a production certificate to:
I was a little in a hurry, trying again; sorry
@RasmusThernoe @finnha Sorry you misunderstood me. We want you to upload a dump to the FTP site in the "gh288" folder, like you did before (we are not currently monitoring all TCP traffic on prod and test).
What we want is a dump of one of your TEST messages that gets accepted on test (we already have one of your PROD messages)
OK. We will create a dump with test certificate and upload to the FTP site.
You have encoded it wrong. If you remove all newlines, and decode the BASE64 you can see all "newlines" in the files ends with CRLF bits. This problem are introduced when you copy binary data in Windows where you have mixed types of line endings. Windows thinks it should try replace all the inconsistent line endings and add CRLF to the end.
I found that problem yesterday and used several hours of debugging before I open a HEX editor and found out the difference.
I will download the file directly from the sftp site onto linux, hoping that the error did not occur prior to uploading it to the sftp site 🙏
The uploaded file contains DOS endings
We will be trying the soap-ui project to see if we're able to XOP/MTOM encode the STS test certificate and send it to production, if we get rejected because of an invalid certificate, i.e.. test vs prod, the likelihood of the error being on the receiving part should be minimal.
We have found the problem
Load balancing manipulates the content in a way that corrupts non-utf-8 data, e.g. the XOP/MTOM encoded certificate.
Workaround Set the HTTP Header: X-Record-Target to 1
POST /cda-ws/DocumentRepository_Service/PatientHealthcareValidateReportingService HTTP/1.1\r\n
X-Record-Target: 1\r\n
...
Could it be a temporary workaround to bypass the load balancer?
We would like to send our CDA documents and get the accumulated error list.
yes if you set the header to a fixed value you should be fine
We are not able to set custom HTTP header values in our integration platform (TIBCO). So this workaround are going to be rather complicated for us to do and not something we can do directly at our customer right now.
Therefore we hope it is possible to make a workaround at your end.
Hmm we can deploy a bugfix that should fix it. Just need to get clearance for an unscheduled deployment activity
Update: We got go ahead, preparing deployment, will write once deployed
Deployed. Please try sending a new requset now.
Note that if your certificate is validated, the first call to the web service can take up to 1 minute due to "cold start" of caches etc (something we will fix in the future).
Btw: we have updated the soap-ui project to send a XOP/MTOM encoded certificate
@RasmusThernoe @JacobBangSSE @finnha any news?
In order to send to the production endpoint we need to sign a production certificate.
@finnha is currently in transit between the office and home therefore the wait. He will try later tonight.
ok
Good news: I can see you have gotten through to the service! Bad news: I see some very strange behavior... Extremely long response times and lots of exceptions..
I am looking into it. I might need to restart the servers
Yes; it is good news And; 26 requests pr. minut is not good
Sorry this might take a few more minutes. The distributed cache is freaking out like I've not seen before. I will try and shut down all servers and only start one server to begin with, to avoid synchronization issues. Stand by.
Ok we have normal service with 1 server now :) Response times looks good on our end. I will add additional servers so you can crank up concurrency for extra throughput if desired
Yes; i can see it rolling :-)
Now; i have sent app. 2500 requests
All servers are up and everything looks normal.
Sorry for the holdup, you guys are apparently the only ones (who tested so far) that encode the certificate in non UTF-8 characters.
Good luck with the execution of the rest of your tests :)
Thank you; we are moving on, I expect to throw 250.000 requests after you :-) good week-end to you; so far ;-)
Good work! Thank you all of you :-)
Closing.
https://github.com/scandihealth/lpr3-docs/issues/291:
Tomorrow we will deploy our load-balancing middleware on TEST (even though there is only 1 back-end server) to make it as similar to PROD as possible.
We are experiencing: "Unable to get certificate from dom"
This is the exact same issue as we experienced in GitHub issue https://github.com/scandihealth/lpr3-docs/issues/48 in the test environment.
Our request is send at: 2019-01-24 12:30:09,956