DIMSE failure (aborting association)

mercure-imaging / mercure

mercure DICOM Orchestrator

https://mercure-imaging.org

MIT License

65 stars 31 forks source link

DIMSE failure (aborting association) #45

Closed TheLoFang closed 1 year ago

TheLoFang commented 1 year ago

Hello, and thanks for this great contribution.

I am having problems sending larger studies to Mercure. Can someone please help me to solve this problem? When I send a CT with 400 or more images, I get the following error in the log receiver:

Jan 07 00:45:00 mercure receiver.sh[115184]: E: DIMSE failure (aborting association): 0006:020d DIMSE Failed to receive message Jan 07 00:45:00 mercure receiver.sh[115184]: E: 0006:0101 ASC Bad presentation context ID Jan 07 00:46:00 mercure receiver.sh[115381]: W: ASSOC: PDV send length 65535 is odd (using 65534) Jan 07 00:46:00 mercure receiver.sh[115381]: E: DIMSE failure (aborting association): 0006:020d DIMSE Failed to receive message Jan 07 00:46:00 mercure receiver.sh[115381]: E: 0006:0101 ASC Bad presentation context ID Jan 07 00:47:00 mercure receiver.sh[115572]: W: ASSOC: PDV send length 65535 is odd (using 65534) Jan 07 00:47:00 mercure receiver.sh[115572]: E: DIMSE failure (aborting association): 0006:020d DIMSE Failed to receive message Jan 07 00:47:00 mercure receiver.sh[115572]: E: 0006:0101 ASC Bad presentation context ID

I need help, thanks

guruevi commented 1 year ago

This is an error in the DCMTK StoreSCP receiver. I see this occasionally from a GE scanner. Basically the error indicates a network error, but in my case, the GE scanner only sends so much data at once before timing out and disconnecting. The tech has to keep retrying until all images are sent.

Other issues that it could be is that the sender is trying to send something your receiver doesn't necessarily accept.

Here is a bit of troubleshooting you can do. https://book.orthanc-server.com/faq/dcmtk-tricks.html

TheLoFang commented 1 year ago

This is an error in the DCMTK StoreSCP receiver. I see this occasionally from a GE scanner. Basically the error indicates a network error, but in my case, the GE scanner only sends so much data at once before timing out and disconnecting. The tech has to keep retrying until all images are sent.

Other issues that it could be is that the sender is trying to send something your receiver doesn't necessarily accept.

Here is a bit of troubleshooting you can do. https://book.orthanc-server.com/faq/dcmtk-tricks.html

Hello guruevi! Exactly, it's a 16-slice GE scanner, at the moment I only have this problem when the CT sends studies of more than 300 images to Mercure. The studies of 250 images downwards do not present errors to me.

I have read the link for Orthanc that you send me, I appreciate it very much, but... I would not know how to apply this specifically for the installation of Mercure, running on Ubuntu 20.04.

I have only managed to modify the orthanc.json file in /app/addons But the rest I have not been able to get it, I would appreciate the help, since Mercure seems to me an excellent resource for my needs.

Thanks

RoyWiggins commented 1 year ago

Yeah, mercure uses the DCMTK dicom receiver, and that's where the error is coming from. There's not too much we can do on the mercure end unless there's a particular configuration of storescp that prevents this error. Mercure runs storescp with a shell script (receiver.sh) here:

https://github.com/mercure-imaging/mercure/blob/master/receiver.sh#L90

You may be able to try debugging by adjusting the command line of storescp in that script in your installation and then restarting the service mercure_receiver.service (if running as a systemd service). You might try adding --debug to see if it can generate some more informative errors.

It does sound like perhaps it's timing out. I'm not sure in DICOM which end the timeout is likely to be coming from (the sender or the receiver). There are a few timeouts you can set at the command line you might try adjusting on the receiver:

  -ts   --socket-timeout  [s]econds: integer (default: 60)
          timeout for network socket (0 for none)

  -ta   --acse-timeout  [s]econds: integer (default: 30)
          timeout for ACSE messages

  -td   --dimse-timeout  [s]econds: integer (default: unlimited)
          timeout for DIMSE messages

TheLoFang commented 1 year ago

command line of storescp

Hi RoyWiggins! I am grateful for the quality of the tool and for your prompt response. I also thought that it might be due to timeout issues, so allot more time. But I notice that the error occurs as soon as the router_scan detects the dicom file.

That is to say: Set the "router_scan_interval" to 180 to give you time for the 400-image CT study to arrive, the study arrived complete, but as soon as the "Receiver" was activated, the problem occurred. Subsequently, the study is sent to the destination, but incomplete and extremely slow (it takes up to 8 to 10 minutes to arrive) and the destination is an internal network gigabit, a PACS DCM4CHEE.

Then the other studies in the queue, such as CR and MR, do not arrive at their destination until the sending of the CT study still in the queue is completed.

I will be very grateful for your help please.

RoyWiggins commented 1 year ago

Hm. If it's the router kicking in that seems to be correlated with the errors, is this case might taking long enough to come in that the router has decided that the series is actually complete, and starts moving the dicoms. Though I wouldn't have thought that would cause storescp errors...

You might try increasing series_complete_trigger instead of the router_scan_interval. I would keep the router_scan_interval relatively low. series_complete_trigger determines how many seconds the router will wait after receiving a new dicom image before it decides that the series is complete and starts routing it, the default is 60 (seconds). As long as each dicom show up in the folder less than series_complete_trigger seconds apart, the router's "scan" shouldn't be interfering with the receiver's storescp...

(It's an unfortunate aspect of the DICOM protocol that clients can't signal to the server when a series is complete, so we end up just having to apply a timeout.)

One thing that you might try is 1) turn off mercure_router.service entirely with systemctl stop, 2) send your dicoms to the receiver, check the receiver logs to make sure it's all OK, and you can even look at the files in the incoming folder to see if the entire series is there, and then 3) start the router service again. The router will then wake up and route the case. If that works, but you get storescp errors when the router is turned on and the series_complete_trigger is something large enough that the router shouldn't be doing anything... then something unexpected is happening and I'm not sure what.

As to why dispatching on the other side is slow, I am not sure. Mercure doesn't do anything very clever, it's using dcmtk's dcmsend to send the case. You might try manually running dcmsend to send the case along and see if that's still slow to try and isolate any role Mercure might be playing.

TheLoFang commented 1 year ago

Hm. If it's the router kicking in that seems to be correlated with the errors, is this case might taking long enough to come in that the router has decided that the series is actually complete, and starts moving the dicoms. Though I wouldn't have thought that would cause storescp errors...

You might try increasing series_complete_trigger instead of the router_scan_interval. I would keep the router_scan_interval relatively low. series_complete_trigger determines how many seconds the router will wait after receiving a new dicom image before it decides that the series is complete and starts routing it, the default is 60 (seconds). As long as each dicom show up in the folder less than series_complete_trigger seconds apart, the router's "scan" shouldn't be interfering with the receiver's storescp...

(It's an unfortunate aspect of the DICOM protocol that clients can't signal to the server when a series is complete, so we end up just having to apply a timeout.)

One thing that you might try is 1) turn off mercure_router.service entirely with systemctl stop, 2) send your dicoms to the receiver, check the receiver logs to make sure it's all OK, and you can even look at the files in the incoming folder to see if the entire series is there, and then 3) start the router service again. The router will then wake up and route the case. If that works, but you get storescp errors when the router is turned on and the series_complete_trigger is something large enough that the router shouldn't be doing anything... then something unexpected is happening and I'm not sure what.

As to why dispatching on the other side is slow, I am not sure. Mercure doesn't do anything very clever, it's using dcmtk's dcmsend to send the case. You might try manually running dcmsend to send the case along and see if that's still slow to try and isolate any role Mercure might be playing.

Thank you very much Roy, I will follow your instructions to the letter. I'll make several attempts and let you know. Grateful for your attentions.

TheLoFang commented 1 year ago

Hm. If it's the router kicking in that seems to be correlated with the errors, is this case might taking long enough to come in that the router has decided that the series is actually complete, and starts moving the dicoms. Though I wouldn't have thought that would cause storescp errors... You might try increasing series_complete_trigger instead of the router_scan_interval. I would keep the router_scan_interval relatively low. series_complete_trigger determines how many seconds the router will wait after receiving a new dicom image before it decides that the series is complete and starts routing it, the default is 60 (seconds). As long as each dicom show up in the folder less than series_complete_trigger seconds apart, the router's "scan" shouldn't be interfering with the receiver's storescp... (It's an unfortunate aspect of the DICOM protocol that clients can't signal to the server when a series is complete, so we end up just having to apply a timeout.) One thing that you might try is 1) turn off mercure_router.service entirely with systemctl stop, 2) send your dicoms to the receiver, check the receiver logs to make sure it's all OK, and you can even look at the files in the incoming folder to see if the entire series is there, and then 3) start the router service again. The router will then wake up and route the case. If that works, but you get storescp errors when the router is turned on and the series_complete_trigger is something large enough that the router shouldn't be doing anything... then something unexpected is happening and I'm not sure what. As to why dispatching on the other side is slow, I am not sure. Mercure doesn't do anything very clever, it's using dcmtk's dcmsend to send the case. You might try manually running dcmsend to send the case along and see if that's still slow to try and isolate any role Mercure might be playing.

Thank you very much Roy, I will follow your instructions to the letter. I'll make several attempts and let you know. Grateful for your attentions.

Hello dear Roy, This is what I have done:

1- Increase series_complete_trigger from 60 to 240 (Send screenshot) 2- Restart all services from the Dashboard 3- Clean all the files from the folders hosted in data: (rm -Rf incoming/* and so on one by one) 4- Shut down the mercare service (systemctl stop mercure_router.service) 5- Restart the complete Server (reboot) 6- Start the Mercure Admin session, located in logs/Receiver 7- Check again that there are no files in any of the folders hosted in /data (all clean) 8- Verify mercare service is off (systemctl status mercure_router.service) is inactive 8- Send the same CT study of 775 images (Complete sending)

9- Just when I finished sending the error W appeared: ASOC: PDV send length 65535 is odd (using 65534) I send screenshot.

10- Check the folder ../incoming (ls |more) 11- All the files are there .dcm and .tag (#CT.X.1.2….dcm and #CT.X.1.2…..tag) 12- Start the mercare service (systemctl start mercure_router.service) 13- Verify mercare_router.service is running = It is Active 14- Queue starts processing without problems, studio stays in queue for 7 minutes

Captura de pantalla 2023-01-09 a las 14 07 00

Error Logs-Receiver

guruevi commented 1 year ago

If you disabled the Mercure router and the error continues appearing, that seems an issue with either the system itself (make sure you aren't depleting your resources somewhere), something interrupting or a bug in storescp. Check in your system logs if something unexpected happens at the same times. One suggestion, would be to change systemd to increase your file descriptor limit size (https://www.thegeekdiary.com/how-to-set-ulimit-values-for-a-systemd-service/)

As I said, I have a GE scanner with the same problem, the problem is on the GE side, they simply set a time limit on how long a DICOM transfer can last. For clinical applications, this seems like a sane limit (and PACS can retrieve files directly as well), but for research applications with lots of data, this is not the case and GE has been unwilling to 'fix' their code.

My suggestion would be to start testing with your own DICOM stream from another system you control (eg. dcmsend in the DCMTK package), that way you at least eliminate a vendor problem, if you have a non-GE system sending the same data stream without a problem, you know where to look.

TheLoFang commented 1 year ago

https://www.thegeekdiary.com/how-to-set-ulimit-values-for-a-systemd-service/

Hi guruevi. I will do as you tell me, at the moment my knowledge for the use of dcmtk is zero, but I will try to follow what you have told me.

Regarding the OS, I use virtualized Ubuntu 20.04 (VirtualBox). 32 Ram, 1.5TB and Xeon Processor I have followed the installation according to the Mercure documentation (https://mercure-imaging.org/docs/install.html), all the other modalities CR, MG, etc. arrive fine, this error is only really with the CT GE, but only when I send 300 images CT or more, if I send less it works without errors. Mercure processes the study, but it takes almost 8 minutes to queue. I will keep looking and trying.

I will do everything possible on my part and I will return here if for any doubt. Thank you

RoyWiggins commented 1 year ago

Yes, I have the same conclusion as gurevi. If the error's coming from dcmtk's storescp and it's happening while the router is turned off, it's not strictly speaking something Mercure has influence over beyond the commandline parameters of storescp. You could turn Mercure entirely off, and and run storescp at the command line yourself (eg "storescp --fork --promiscuous +xa -od "/tmp/test_ge" +uf 104" and almost definitely get the same result.

If gurevi is right and the timeout is happening on the GE client side, there's probably nothing to be done on the server side, but you might try alternative DICOM servers. For example, set up your own Orthanc server or install OsiriX on a workstation. Orthanc though is also based on dcmtk, so is liable to have the same issue. If it somehow works with Orthanc then that would be quite interesting.

RoyWiggins commented 1 year ago

Though, hm. I guess on your last go-through it mostly worked? I'm not sure the "PDV Send Length" errors are important, though probably they're not ideal.

If it is a timeout issue though, it could be sporadic if you're on the edge of whatever the timeout is and disabling the router fixing it might have been a coincidence.

If the problem always happens when the router is turned on but never happens when the router is turned off, even with the extended series_complete_trigger, then I'm somewhat stumped. I can't think of any obvious way for the router to influence how the receiver is behaving.

TheLoFang commented 1 year ago

Hi thanks Roy, I have carried out the tests with DCM4CHEE as a server, and I have no problem, I also use applications such as ONIS Dicom Viewer and Horus, I have not had any problems either, I only need KPacs, but the problem is only with Mercure, even at the speed level I used DCM4CHEE forwarding and it worked fast, I sent the study of 775 CT images in approx. 1.20 minutes, of course mercure offers better tools and it is a very attractive option, it works great with everything else. I'm thinking of doing the installation from scratch, I'll even use another server with higher capacity in a totally clean installation of Ubuntu to confirm.

But first I'll be trying to do what Gurevi says, about the dcmtk, this will get us somewhere at the end of the day. Thanks Roy

tblock79 commented 1 year ago

As others have pointed out, this problem is related to dcmtk, as mercure is just using the storescp tool from dcmtk to receive DICOMs. You could 1) try upgrading to a newer version of dcmtk, or 2) ask in the dcmtk forum if someone there knows a solution: https://forum.dcmtk.org/

Closing the issue here for now, as we can't solve this problem on the mercure side.

TheLoFang commented 1 year ago

https://forum.dcmtk.org/

Hi tblock79, Thanks, I'm still looking into it. If I find a solution I will post it for future similar cases.