Open suever opened 9 years ago
From darcymason@gmail.com on December 03, 2010 17:24:27
I'm guessing this is because the check for stopping is done by a call-back function. Function calls are costly in python, so doing one in the innermost loop of reading the file could add a lot of time.
It might be reasonable to special case the check for pixel data element directly in the inner loop and avoid the call-back. It might require some change to the calling signature of read_file(), or some of the intermediate methods it calls.
Status: Accepted
Labels: -Type-Defect -Difficulty-Medium Type-Enhancement Difficulty-Easy
From njv...@gmail.com on December 06, 2010 08:45:09
I'm curious: what other use case is there for stopping?
From darcymason@gmail.com on December 06, 2010 19:40:29
I'm curious: what other use case is there for stopping? Well, if it were indeed faster, then it could make sense for any kind of filtering operation on a large set of files -- read just enough to get patient name/id to group ones for the same patient, or similarly by exam id for a certain patient, or a certain referring physician. A small list of key information (patient name, id, study comments) could be offered in a GUI for a user to choose which file(s) to view.
From njv...@gmail.com on December 07, 2010 09:05:52
Makes sense.
What if you had a couple of lists, say, stop_before_tags and stop_after_all_tags -- as you read tags, you check for the tag in those two lists.
If the tag is in stop_before_tags, it stops before reading further. If it's in stop_after_all_tags, it'd remove the tag from the list, read the value, and if the list is now empty, stop reading further.
That way, you could do
stop_before_tags = [0x7FE00010] # pixel data
or
stop_after_all_tags = [0x00100010, 0x00324000] # patient's name, study comments
Of course, I'm not exactly sure what should happen if both are specified.
Also, I don't know enough about the structure of dicoms to know this, but if I stop before reading pixel data, would I risk missing other tags I might be interested in? Would it make more sense to, instead, skip reading certain tags?
From darcymason@gmail.com on December 07, 2010 16:04:26
What if you had a couple of lists, say, stop_before_tags and stop_after_all_tags -- as you read tags, you check for the tag in those two lists.
Yes, I like that idea, it is reasonably straightforward and could be done entirely inside the reading loop, so it would cost very little time.
A couple of minor refinements or clarifications of the idea:
Of course, I'm not exactly sure what should happen if both are specified.
Each could just be checked individually. If either reaches the "stop" condition, the reading is stopped.
...if I stop before reading pixel data, would I risk missing other tags I might be interested in? Would it make more sense to, instead, skip reading certain tags?
There is very little that can come after pixels (see tags in DICOM dictionary > 7Fe00010). Very unlikely to want these items. Skipping has been tried, it saves no time at all compared with reading but does save memory of course. The defer_size argument handles that.
From njv...@gmail.com on December 07, 2010 16:11:22
- The user should also be able to pass in values as DICOM names (keywords) rather than numeric tags. Then there is a one-time check at the beginning -- if not already a tag number, the keyword is looked up and converted to the group element tag number.
Definitely.
Also, it'd probably be sane to store the tag codes, after lookup, in a set instead of a list -- no worries about dupes, and O(1) lookups.
Thanks!
From njv...@gmail.com on December 03, 2010 11:13:32
On my DICOMs, (2D slices from a GE MRI scanner) I've found that telling PyDICOM to skip the pixel data is actually slower than reading it. This is both true when repeatedly re-reading one file with timeit, or in running time_test.py
test_full_read() 601937 function calls (601137 primitive calls) in 1.750 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function) 102100 0.906 0.000 0.906 0.000 {method 'read' of 'file' objects} 34200/33600 0.211 0.000 0.656 0.000 filereader.py:150(data_element_generator) 35100 0.141 0.000 0.171 0.000 tag.py:8(Tag) 400/200 0.120 0.000 1.032 0.005 filereader.py:259(read_dataset) 33800 0.086 0.000 0.086 0.000 {built-in method new of type object at 0x100126080}
test_partial() 829791 function calls (828991 primitive calls) in 2.355 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function) 101900 1.048 0.000 1.048 0.000 {method 'read' of 'file' objects} 67800 0.275 0.000 0.327 0.000 tag.py:8(Tag) 34100/33500 0.228 0.000 0.958 0.000 filereader.py:150(data_element_generator) 1 0.162 0.162 2.348 2.348 time_test.py:69(test_partial) 67200 0.134 0.000 0.502 0.000 tag.py:56(eq)
Or, in a python shell:
I wouldn't be surprised to not save much time in reading these, but to have it be significantly slower? That's odd.
I'm using the version easy_install grabbed for me on December 2, 2010.
I can't post any dicoms to Google, but I could send you a few.
Cheers, -Nate
Original issue: http://code.google.com/p/pydicom/issues/detail?id=94