suever / pydicom-experimental

pydicom test
0 stars 1 forks source link

Write vs Read speed [originally related to Decimal slow in 0.9.7] #114

Open suever opened 9 years ago

suever commented 9 years ago

From patricem...@gmail.com on April 09, 2012 16:23:01

On a large RT structure set file, I noticed a huge difference between read and write performance: well under 1 s on read and around 40 s on write. I have attached the file and a test script.

Attachment: RStest.dcm.gz test.py

Original issue: http://code.google.com/p/pydicom/issues/detail?id=114

suever commented 9 years ago

From darcymason@gmail.com on April 09, 2012 15:13:26

I've tried and confirmed the issue (in fact the time spread is far worse on my system).

Some preliminary profiling confirms my suspicion that it is not really the writing, it is in the converting of 'raw data elements', and in particular ones with VR of DS (which the example file has a lot of) in pydicom 0.9.7. Python Decimal is pretty slow compared with float [1]. There is a related (I think) issue raised on the dicompyler group [2].

I'm going to look into a way to go back to using float rather than Decimal for DS, and perhaps only check and warn on writing the values, rather than when setting them.

Another solution, which should be done anyway, is to not convert raw data elements before writing them. Then only those data elements touched by the user code would be hit with this speed penalty.

[1] http://stackoverflow.com/questions/195116/python-decimal [2] http://groups.google.com/group/dicompyler/browse_thread/thread/217e3d304f55cac2

Status: Started

suever commented 9 years ago

From darcymason@gmail.com on October 16, 2012 12:13:10

Changed title to point out connection to Decimal issues.

Summary: Write vs Read speed - Decimal slow in 0.9.7

suever commented 9 years ago

From randle.taylor on October 17, 2012 10:30:42

I recently came across this issue too and found that the issue can be mitigated somewhat by using the cdecimal[1] library which is a drop in replacement for the decimal library in the std library. The write operation from Patrice's test script is still quite slow, but using the cdecimal routine cuts the run time by a factor of 2 on my (slow) work machine (100s vs 45s).

To use the cdecimal module in place of the decimal module, simply add:

import sys sys.modules["decimal"] = import("cdecimal")

before you import pydicom (or any other module that makes use of decimal).

This doesn't eliminate the issue entirely but I thought I'd mention it in case it is of any use to someone.

[1] http://www.bytereef.org/mpdecimal/index.html

suever commented 9 years ago

From darcymason@gmail.com on October 22, 2012 20:17:26

DS conversion from raw has been improved by approx a factor of five, in the few revisions up to revision d78174a8d9de . This is based on the raw convert part of the profiling tests in raw_convert_test.py (test/performance subdirectory).

The speed improvements included taking DS back to a float-based representation (by default), although the (pydicom 0.9.7) DS based on Decimal can still be switched to if desired. It is about 3.5 times slower, but using randle's cdecimal trick it comes down to about 1.8 times slower than the float representation in my testing.

I've left the issue open to see if further profiling can still help.

suever commented 9 years ago

From darcymason@gmail.com on October 23, 2012 19:00:35

More improvements to revision 78ba350a3eb8 . Now approx factor of 10 speed improvement for raw_convert_test, compared with pydicom 0.9.7 release. Is approx 15-fold if just directly force raw_convert (using dataset.walk()) rather than through str(dataset).

There is probably little else that can be done to improve this, except to avoid the conversion entirely, which may be picked up again later, so I'll leave the issue open for that.

Summary: Write vs Read speed [originally related to Decimal slow in 0.9.7]

suever commented 9 years ago

From reese.ha...@gmail.com on November 16, 2012 10:48:23

I have just updated to version 0.9.7 and noticed this issue but I noticed it in read (I don't write out the data later.) For example, in the test.py code that is above if you add repr(ds); after reading in ds the times for reading dominate.

without repr(ds); Reading ... done in 0.47 s. Writing ... done in 140.89 s. with repr(ds); Reading ... done in 141.75 s. Writing ... done in 2.45 s.

BTW how do I get the revision to install? Can I use easy_install? I would like to test with the revisions that have been added.

suever commented 9 years ago

From Suever@gmail.com on November 16, 2012 11:02:35

That is the behavior that would be expected.

The slow down results from the conversion of a RawDataElement to a DS DataElement. When the dataset is initially read in, the data elements remain as RawDataElements until they are accessed.

They are accessed in three scenarios: 1) The data element is accessed directly (dataset.PatientsName) 2) The data element value is accessed indirectly (in your case it's repr() which requires the value of all data elements) 3) the dataset is written to a file (converts each element from raw then writes).

In your case, you are converting from raw when you call repr(ds), therefore the conversion is unnecessary when writing to file. Normally, if you just read in the file and then wrote it back to the file, the conversion would happen inside the write method.


In order to install a specific revision from the development repository, you will need to install mercurial and run the following:

hg clone https://code.google.com/p/pydicom hg checkout

python setup.py install

NOTE: The repository is really used for development purposes so there may be bugs introduced during development so if you are using this in any situation where you require stable code, please use the release versions.

suever commented 9 years ago

From reese.ha...@gmail.com on November 16, 2012 11:15:18

Is there a way to speed this up besides going back to the previous release version (I have made all the Beams to BeamSequence etc changes which I prefer). I have a loop that is gathering data for plotting that has really hit a speed bump. I have already added the cdecimal replacement. Anything else I could try since this is "mission critical"?

suever commented 9 years ago

From darcymason@gmail.com on November 18, 2012 21:09:40

Is there a way to speed this up besides going back to the previous release version

I've just posted a release candidate for a pydicom 0.9.8 release which includes the faster DS performance (using float values by default, not Decimal). It is not extensively tested yet beyond the usual unit tests and example scripts, but as far as I can tell, it is stable. You could try installing that. Easy_install or pip should pick the new version up.

suever commented 9 years ago

From reese.ha...@gmail.com on November 20, 2012 07:33:07

Thank you. I have tried the release candidate and everything appears to be working smoothly...and speed is increase by about a factor of 10 compared to the 0.9.7 version. No issues to report so far.

Reese