Open GoogleCodeExporter opened 9 years ago
I've tried and confirmed the issue (in fact the time spread is far worse on my
system).
Some preliminary profiling confirms my suspicion that it is not really the
writing, it is in the converting of 'raw data elements', and in particular ones
with VR of DS (which the example file has a _lot_ of) in pydicom 0.9.7. Python
Decimal is pretty slow compared with float [1]. There is a related (I think)
issue raised on the dicompyler group [2].
I'm going to look into a way to go back to using float rather than Decimal for
DS, and perhaps only check and warn on writing the values, rather than when
setting them.
Another solution, which should be done anyway, is to not convert raw data
elements before writing them. Then only those data elements touched by the user
code would be hit with this speed penalty.
[1] http://stackoverflow.com/questions/195116/python-decimal
[2]
http://groups.google.com/group/dicompyler/browse_thread/thread/217e3d304f55cac2
Original comment by darcymason@gmail.com
on 9 Apr 2012 at 10:13
Changed title to point out connection to Decimal issues.
Original comment by darcymason@gmail.com
on 16 Oct 2012 at 7:13
I recently came across this issue too and found that the issue can be mitigated
somewhat by using the cdecimal[1] library which is a drop in replacement for
the decimal library in the std library. The write operation from Patrice's
test script is still quite slow, but using the cdecimal routine cuts the run
time by a factor of 2 on my (slow) work machine (100s vs 45s).
To use the cdecimal module in place of the decimal module, simply add:
import sys
sys.modules["decimal"] = __import__("cdecimal")
*before* you import pydicom (or any other module that makes use of decimal).
This doesn't eliminate the issue entirely but I thought I'd mention it in case
it is of any use to someone.
[1]http://www.bytereef.org/mpdecimal/index.html
Original comment by randle.taylor
on 17 Oct 2012 at 5:30
DS conversion from raw has been improved by approx a factor of five, in the few
revisions up to revision d78174a8d9de. This is based on the raw convert part of
the profiling tests in raw_convert_test.py (test/performance subdirectory).
The speed improvements included taking DS back to a float-based representation
(by default), although the (pydicom 0.9.7) DS based on Decimal can still be
switched to if desired. It is about 3.5 times slower, but using randle's
cdecimal trick it comes down to about 1.8 times slower than the float
representation in my testing.
I've left the issue open to see if further profiling can still help.
Original comment by darcymason@gmail.com
on 23 Oct 2012 at 3:17
More improvements to revision 78ba350a3eb8. Now approx factor of 10 speed
improvement for raw_convert_test, compared with pydicom 0.9.7 release. Is
approx 15-fold if just directly force raw_convert (using dataset.walk()) rather
than through str(dataset).
There is probably little else that can be done to improve this, except to avoid
the conversion entirely, which may be picked up again later, so I'll leave the
issue open for that.
Original comment by darcymason@gmail.com
on 24 Oct 2012 at 2:00
I have just updated to version 0.9.7 and noticed this issue but I noticed it in
read (I don't write out the data later.) For example, in the test.py code that
is above if you add repr(ds); after reading in ds the times for reading
dominate.
without repr(ds);
Reading ... done in 0.47 s.
Writing ... done in 140.89 s.
with repr(ds);
Reading ... done in 141.75 s.
Writing ... done in 2.45 s.
BTW how do I get the revision to install? Can I use easy_install? I would
like to test with the revisions that have been added.
Original comment by reese.ha...@gmail.com
on 16 Nov 2012 at 6:48
That is the behavior that would be expected.
The slow down results from the conversion of a RawDataElement to a DS
DataElement. When the dataset is initially read in, the data elements remain as
RawDataElements until they are accessed.
They are accessed in three scenarios:
1) The data element is accessed directly (dataset.PatientsName)
2) The data element value is accessed indirectly (in your case it's repr()
which requires the value of all data elements)
3) the dataset is written to a file (converts each element from raw then
writes).
In your case, you are converting from raw when you call repr(ds), therefore the
conversion is unnecessary when writing to file. Normally, if you just read in
the file and then wrote it back to the file, the conversion would happen inside
the write method.
---------------------------
In order to install a specific revision from the development repository, you
will need to install mercurial and run the following:
hg clone https://code.google.com/p/pydicom
hg checkout <revision_number>
python setup.py install
NOTE: The repository is really used for development purposes so there may be
bugs introduced during development so if you are using this in any situation
where you require stable code, please use the release versions.
Original comment by Suever@gmail.com
on 16 Nov 2012 at 7:02
Is there a way to speed this up besides going back to the previous
release version (I have made all the Beams to BeamSequence etc changes
which I prefer). I have a loop that is gathering data for plotting
that has really hit a speed bump. I have already added the cdecimal
replacement. Anything else I could try since this is "mission
critical"?
Original comment by reese.ha...@gmail.com
on 16 Nov 2012 at 7:15
>> Is there a way to speed this up besides going back to the previous release
version
I've just posted a release candidate for a pydicom 0.9.8 release which includes
the faster DS performance (using float values by default, not Decimal). It is
not extensively tested yet beyond the usual unit tests and example scripts, but
as far as I can tell, it is stable. You could try installing that. Easy_install
or pip should pick the new version up.
Original comment by darcymason@gmail.com
on 19 Nov 2012 at 5:09
Thank you. I have tried the release candidate and everything appears
to be working smoothly...and speed is increase by about a factor of 10
compared to the 0.9.7
version. No issues to report so far.
Reese
Original comment by reese.ha...@gmail.com
on 20 Nov 2012 at 3:33
Original issue reported on code.google.com by
patricem...@gmail.com
on 9 Apr 2012 at 8:23Attachments: