pymedphys / pymedphys

A community effort to develop an open standard library for Medical Physics in Python. Building quality transparent software together via peer review and open source distribution. Open code is better science.
https://docs.pymedphys.com
Apache License 2.0
312 stars 73 forks source link

TRF to RT DICOM plan #1046

Closed SimonBiggs closed 3 years ago

SimonBiggs commented 4 years ago

The key API for TRF -> RT DICOM plan is through the pymedphys.Delivery object:

https://github.com/pymedphys/pymedphys/blob/299070597f8251e08b7bf181380f6a5a87ef2726/pymedphys/__init__.py#L5

There is a method pymedphys.Delivery.from_logfile (which I should promptly rename from_trf to avoid ambiguity). It is defined here:

https://github.com/pymedphys/pymedphys/blob/299070597f8251e08b7bf181380f6a5a87ef2726/pymedphys/_trf/decode/delivery.py#L31-L60

Importantly, delivery itself is built on top of collections.namedtuple and consists of five parameters ["monitor_units", "gantry", "collimator", "mlc", "jaw"]:

https://github.com/pymedphys/pymedphys/blob/299070597f8251e08b7bf181380f6a5a87ef2726/pymedphys/_base/delivery.py#L32-L34

Once a TRF file has been converted into a Delivery object, it can then proceed to be converted into a DICOM RT plan file. One requirement is that the original DICOM RT plan file is provided as a template. This is achieved using the Delivery.to_dicom method, which is defined here:

https://github.com/pymedphys/pymedphys/blob/299070597f8251e08b7bf181380f6a5a87ef2726/pymedphys/_dicom/delivery/core.py#L79-L108

Therefore, the process of converting a trf file to a DICOM plan file looks a bit like this:

import pydicom
import pymedphys

# downloads the trf dcm testing data from
# https://zenodo.org/record/3870436/files/delivery_test_data.zip?download=1
delivery_test_paths = pymedphys.zip_data_paths("delivery_test_data.zip")

test_paths_to_use = [
    path for path in delivery_test_paths 
    if path.parent.name == 'original'
]

trf_path = [path for path in test_paths_to_use if path.suffix == '.trf'][0]
dcm_path = [path for path in test_paths_to_use if path.suffix == '.dcm'][0]

original_dcm_plan = pydicom.dcmread(dcm_path, force=True)

delivery = pymedphys.Delivery.from_logfile(trf_path)
new_dcm_plan = delivery.to_dicom(dicom_template=original_dcm_plan)

To see the "round trip test" that utilises that data downloaded above see:

https://github.com/pymedphys/pymedphys/blob/299070597f8251e08b7bf181380f6a5a87ef2726/pymedphys/tests/delivery/test_deliverydata_trf_dicom_round_trip.py#L113-L135

So essentially, to solve a wider range of DICOM RT plan types, what is required is to actually work with the Delivery object, and and improve the current to_dicom method. The TRF --> Delivery pathway is currently quite robust. The Delivery --> RT Plan DICOM feels like a giant ball of sticky tape ... :(.

I can follow up this post with a deep dive into the Delivery --> RT Plan DICOM implementation if that helps also.

SimonBiggs commented 4 years ago

Also, I'd like to add that the surface area and the many implicit APIs of the Delivery object has bothered me. It has such a huge surface area. I am open to many suggestions regarding it. Part of me wants to make it an object that is only used internally within PyMedPhys and then expose a sort of "wrapper object" where we can tightly constrain the API.

Anyway. I just wanted to flag that Delivery is certainly not set in stone. Even internally, it being a namedtuple doesn't feel so appropriate either. I feel like frozenmap as discussed over at https://github.com/pymedphys/pymedphys/issues/1011#issuecomment-675156988 would be a far more appropriate base class.

SimonBiggs commented 4 years ago

I had also been quite partial in the past to the following library:

http://xarray.pydata.org/en/stable/

One thing that really bugs me about the current Delivery object is that the axes definitions of the MLC and Jaw arrays are not immediately obvious. I really like how in xarray the axes labels are closely tied to the array contents.

SimonBiggs commented 4 years ago

I'd even be open refactoring the intermediary object away from being this Delivery thing, and instead have DICOM RT plan be the intermediary. And then just have a range of functions that can convert to and from DICOM RT plan, and do stuff when passed a DICOM RT plan.

To be honest, that fixes almost all of my concerns. No longer are we exposing an object with a large API, instead just a set of functions that act upon a pydicom.Dataset that is an RT plan. Users can just convert to and from DICOM RT plan for all their needs.

The biggest downside is that DICOM RT plan is not allowed to be used as an input to a function that is wrapped in functools.lrucache... that can be a pain as that tool is really useful at allowing costless functions. But maybe that's a price worth paying...

sjswerdloff commented 4 years ago

What, in theory, does Delivery offer compared to DICOM RT Plan? Does it allow for more information? A richer data model (relationships between internal objects)? Ease of interaction (not just a "dataclass")? If not, then I'd encourage moving away from a 3rd model (there is the TRF data model, and there is the DICOM model, and the DICOM model is the accepted standard... so unless there is something that Delivery offers that DICOM does not... it's just more to deal with). Having said that, there is 2nd Gen DICOM-RT (radiations and radiation records), so if having an intermediate model simplifies having a different target (2nd gen rather than RT Plan), that might be part of its value proposition. Note that I said DICOM model and not explicitly RT Plan, because the other object, that more closely maps to TRF, conceptually, is RT Beams Treatment Record. Using RT Beams Treatment Record would create the additional requirement of having an RT Beams Treatment Record -> RT Plan conversion (what I refer to as "Plan as Delivered"). But that in itself is quite useful for systems that already provide an RT Beams Treatment Record rather than a proprietary/non-standard record (lower case), such as an IHE-RO Treatment Delivery Workflow profile compliant system (there are also vendors who provide a DICOM RT (Ion) Beams Treatment Record without the interface being IHE-RO TDW).

cacheing is a performance optimisation. Premature optimisation is not a good thing. So unless there is already a clear case to be made that the performance is problematic or would be problematic, being able to use lrucache is not sufficient to motivate a design decision. One should be able to wait until it's clear where a performance issue lies, and focus explicitly on that and ensure it's inputs are in the form of something that are amenable to working with lrucache.

xarray looks promising, and being able to use it inside Delivery to make operating on the Delivery object (hierarchy) might be motivation to have Delivery as a 3rd data model (ease of interaction).

SimonBiggs commented 4 years ago

cacheing is a performance optimisation. Premature optimisation is not a good thing.

Yup :), I'll scrub it from my head and we can deal with that if need be. Thanks for snapping me out of it :).

xarray looks promising, and being able to use it inside Delivery to make operating on the Delivery object (hierarchy) might be motivation to have Delivery as a 3rd data model (ease of interaction).

Yup. I thought about it a bit more over night and this morning. That ease of interaction point I think is key. Having a nice xarray Dataset that interacts nicely and by default with tools such as matplotlib/pandas/numpy is worth the cost of having an internal intermediary.

Also, there are quite a range of input/output targets in this plan and deliver space. Currently the following "input outputs" being targeted are:

https://github.com/pymedphys/pymedphys/blob/2cf5dacc50fd841c9a9ab07dd97a06fcb748b80e/pymedphys/_delivery.py#L10-L18

Currently the design is a tuple of the following layout:

Using the terminology as defined with xarray (http://xarray.pydata.org/en/stable/terminology.html) the whole delivery object can be replaced by a standardised way of defining an xarray Dataset. Within the Dataset the Cumulative MU as defined above would be a Dimension coordinate (a dimension label). One difference to the cumulative MU definition in the xarray case is that a new requirement would be that MU must be strictly increasing (consecutive MU values must not be the same). Gantry angle, collimator, (and potentially table in the future), would be DataArrays where their dimension label is the MU Dimension coordinate. Potentially a jaw could be generalised into a 1 pair MLC. The dimension of the leaf pairs and banks would be generalised to Pair Index and Bank Index. This would be a three dimensional DataArray defining the current travel position of each leaf/jaw, dimensions being Cumulative MU, [id] Bank Index, [id] Pair Index. The [id] label needs to be unique for each beam limiting device. To extract more information out of each Jaw/MLC bank's indices, there can be two optional defining features that give more info. A leaf width DataArray which uses a given MLC/Jaw's [id] Pair Index as its Dimension coordinate and then provides an array of leaf widths. There can then also be a collimator bank position which provides the displacement from the CAX/CRA to the leaf centre, it would also use that same [id] Pair Index as its Dimension coordinate. (Not so sold on the hierarchy here...).

[EDIT] Actually, instead of using Cumulative MU as the dimension, instead, use separate Dimension coordinate throughout that can optionally be either a time step or a control point index. That frees up Cumulative MU to not need to be strictly increasing, and its a far more sensible "dimension". Then Cumulative MU becomes a DataArray.

Also, it would be quite helpful to have a create_delivery_xarray_dataset() function, which can be passed these items in their raw format and it will create this specified xarray.Dataset. [/EDIT]

By have an intermediary users are asked to first convert an input to the intermediary and then they are able to convert to any of the output formats. Using the same justification as the place for a language server:

https://code.visualstudio.com/api/language-extensions/language-server-extension-guide#why-language-server

However, by exposing the intermediary, we are expose an object which we may want to be free to adjust in the future. Having the object itself be part of our API significantly undermines our internal flexibility.

One idea around that is to expose a convert function that internally calls our intermediary object but does not return the object itself, just its result. That then frees us to go to town playing around with the object, adjusting it to suite our needs. Maybe one day, if we are relatively sure that we would like to stabilise the object itself, on that day (and only then) we could expose it for other libraries to use.

I envisage a convert function to work a bit like how pandoc's API works:

https://pandoc.org/demos.html

Can't think of a tidy API just yet though (which isn't a vote of confidence for the concept).

sjswerdloff commented 4 years ago

That frees up Cumulative MU to not need to be strictly increasing, and its a far more sensible "dimension".

That's critical. Step and Shoot has Cumulative MU that looks like [0, f, f, g, g, final]. Common in conventional RT, ubiquitous in Particle/Ion/Proton therapy.

The pandoc approach looks useful. It precludes operating on the intermediate object outside of conversion. So if you need to filter something, you need to either filter the input or filter the output, or always apply the filter as part of conversion (i.e. it's part of the library, not accessible to an application programmer/user). That's not a problem, it's a consequence. A common filter is "data cleaning".

The API that comes to my mind is delivery_convert --input_type MyInputType --output_type MyOutputType -i my_input -o my_output where -i and -o values default to stdin and stdout,
(because I like *nix style pipelines, and you can do some fun stuff with co-processing in Ksh and Bash) but specification of the input type and output type are mandatory and have to match the known set of filters, although one could have separate configuration for plugins

SimonBiggs commented 4 years ago

That's critical. Step and Shoot has Cumulative MU that looks like [0, f, f, g, g, final].

Yeah, I did know that regarding step and shoot, and I did think about it... not sure why I disregarded it.

delivery_convert --input_type MyInputType --output_type MyOutputType -i my_input -o my_output

Yup, I do like that.

What about a Python API? When I make an API I like to see if I can find one already in the wild. I can't think of one that implements this pattern though...

sjswerdloff commented 4 years ago

The first question should be "Is there a pattern describing the behaviour or action intended" (pattern in the GoF / PoSA sense). Straight up, this is translation, so the following is worth looking at.... http://wiki.c2.com/?TranslatorPattern And this is usually discussed as an alternative to Visitor. I have used Visitor for translation quite successfully. Many times. But your input objects have to accept a visitor (if they don't already, it might be a bit of work).

Stepping back a bit: This is creational. Factory comes to mind, probably Abstract Factory. Builder is an option if there are internal items that are built up separately and assembled (beams, devices, fraction groups, patient demographics) But it could also be Command with execute() being an implementation of "populate_from()" followed by "export_to"), or the creation of the Command object would be based on the input, and execute() is export(). And it could be a Factory that creates the Command object (whose class depends on the intended output type).

As an aside, the "strategy" approach now used in anonymisation is more or less a visitor because we are passing it in, and not subclassing.

Once your design approach is clear, then the API to it should be straight forward

SimonBiggs commented 4 years ago

I guess what I mean is, in the wild in Python the need for "conversion" between data representations always seems to have one kind of API. People tend to expect it. I want people to be able to lean on their intuition when they're using pymedphys. If there is a given task they are doing, I want the syntax to be as similar as possible as other APIs out there.

Here is an example of the "conversion" API for pandas:

pandas.DataFrame.from_dict
pandas.DataFrame.from_records
pandas.DataFrame.to_parquet
pandas.DataFrame.to_pickle
pandas.DataFrame.to_csv
pandas.DataFrame.to_hdf
pandas.DataFrame.to_sql
pandas.DataFrame.to_dict
pandas.DataFrame.to_excel
pandas.DataFrame.to_json
pandas.DataFrame.to_html
pandas.DataFrame.to_feather
pandas.DataFrame.to_latex
pandas.DataFrame.to_stata
pandas.DataFrame.to_gbq
pandas.DataFrame.to_records
pandas.DataFrame.to_string
pandas.DataFrame.to_clipboard
pandas.DataFrame.to_markdown

For xarray:

xarray.Dataset.from_dataframe
xarray.Dataset.from_dict
xarray.Dataset.to_netcdf
xarray.Dataset.to_zarr
xarray.Dataset.to_array
xarray.Dataset.to_dataframe
xarray.Dataset.to_dask_dataframe
xarray.Dataset.to_dict

So that's what informed the current Delivery API, which looks like the following:

pymedphys.Delivery.from_dicom
pymedphys.Delivery.from_icom
pymedphys.Delivery.from_logfile
pymedphys.Delivery.from_monaco
pymedphys.Delivery.from_mosaiq
pymedphys.Delivery.to_dicom
pymedphys.Delivery.mudensity()

But, there is an assumption there. In the DataFrame and Dataset cases above, that object is something that is useful to the user. Delivery here, isn't that useful to the user currently. It is quite opaque. People rarely want to make a Delivery object, they actually just care about the bits around it.

And, at the end of the day, the API of the internal function/class that we find most useful, does not need to match the API of the exposed function/class. In fact, separating these two has the huge benefit of allowing our internal API to be flexible and have a large surface area and our external API to be rigid with a small surface area. A way this can be achieved may be via Composition (https://en.wikipedia.org/wiki/Composition_over_inheritance), wrapping up our internal object, attached to a private parameter within the exposed object. That way the exposed function/class can be a bit of a shell object that doesn't do much except be purely an API intermediary.

I do think there is a lot of value in knuckling down an API that is as simple as possible (and no simpler), as well as having it be unified with the API patterns of the rest of the pymedphys library and the greater scientific Python ecosystem. I don't think the API needs to have any dependency on the internal implementation.

Anyway... I think I need to mull on it more. I'm certainly liking the xarray version... and maybe, that object might actually be an object in and of itself that is useful to users. Potentially we could have a Treatment object (better than delivery, because its not always delivered, and aligns better with the words "RT Beams Treatment Record" without differentiating whether it is a record or a plan as it can be either).

The treatment object could have within itself that xarray representation, and then have the following conversion methods:

pymedphys.Treatment.from_xarray_dataset # set the internal xarray Dataset and run structure validation
pymedphys.Treatment.from_dataframe      # use xarray.Dataset.from_dataframe and run structure validation
pymedphys.Treatment.from_dict           # use xarray.Dataset.from_dict and run structure validation
pymedphys.Treatment.from_dicom          # Either 'RT Beams Treatment Record' or 'RT Plan'
pymedphys.Treatment.from_icom
pymedphys.Treatment.from_trf
pymedphys.Treatment.from_csv
pymedphys.Treatment.from_toml
pymedphys.Treatment.from_monaco
pymedphys.Treatment.from_mosaiq
pymedphys.Treatment.to_dicom            # Will be 'RT Beams Treatment Record' if axis is timestep
                                        # 'RT Plan' if it is control point
pymedphys.Treatment.to_xarray_dataset   # just output the internal xarray Dataset
pymedphys.Treatment.to_dataframe        # use xarray.Dataset.to_dataframe
pymedphys.Treatment.to_dict             # use xarray.Dataset.to_dict
pymedphys.Treatment.to_csv
pymedphys.Treatment.to_toml

The following xarray.DataArrays as parameters for easy access by users:

pymedphys.Treatment.monitor_units
pymedphys.Treatment.gantry_angles
pymedphys.Treatment.collimator_angles
pymedphys.Treatment.mlc_positions
pymedphys.Treatment.jaw_positions

And the following method for "fluence" calculation:

pymedphys.Treatment.mudensity()

Also, to be able to transition from the current Delivery object internally to a new xarray based Treatment object there could be the following internal transformations:

pymedphys.Treatment._from_delivery
pymedphys.Treatment._to_delivery
SimonBiggs commented 4 years ago

So, I've given up on the idea of the Treatment object, or the Delivery object. The fact that it won't map to ion therapy was a nail in the coffin for me.

I think the only API we should expose regarding converting from one thing to another is:

pymedphys.convert(input_item: Any, input_type: str, output_type: str, **kwargs) -> Any

# where **kwargs can have things like, `dicom_template` and other optional requirements.

That way, you can use whatever you like to make the transformation internally. And at the end of the day it won't be exposed.

I'll deprecate Delivery, create the above convert function to replace it, and after a few versions I'll remove Delivery from the API.

SimonBiggs commented 4 years ago

I'll deprecate Delivery, create the above convert function to replace it, and after a few versions I'll remove Delivery from the API.

Today I made a little demo where I made use of the Delivery API:

image

-- https://nbviewer.jupyter.org/url/simonbiggs.net/mosaiq-fields.ipynb

And, point of the story... The Delivery object isn't perfect, but it does make me want to redact the following comment:

But, there is an assumption there. In the DataFrame and Dataset cases above, that object is something that is useful to the user. Delivery here, isn't that useful to the user currently. It is quite opaque. People rarely want to make a Delivery object, they actually just care about the bits around it.

At the end of the day Delivery is useful. I find it useful. In its current form it really can only support a single pair of jaws, but, that's useful nonetheless. Yes, it might not be able to encompass ion beams. But, that's okay. Maybe its name could be something like MlcTreatment instead...

It still is opaque, and I do believe that the xarray format detailed above would be more self documenting for the user. I think, the path forward should actually be to make a new object, built on top of xarray that hides within it, by composition, the current Delivery object. And I can then deprecate Delivery, but with the aim to replace it with a very similar object called MlcTreatment.

sjswerdloff commented 4 years ago

I think, the path forward should actually be to make a new object, built on top of xarray that hides within it, by composition, the current Delivery object. And I can then deprecate Delivery, but with the aim to replace it with a very similar object called MlcTreatment.

Composite and Facade. In DICOM RT Plan and RT Ion Plan are different objects. RT Plan supports Brachy as well as Teletherapy, and frankly, IMHO, that doesn't really work out so well. But it did ensure that storage of an object for Brachy would be the same as for Teletherapy, even if the object could not be used by many applications unless they were explicitly aware of Brachy content.

In 2nd Gen DICOM-RT, the beams got broken out as first class objects ("Radiation"), and that should make things easier to work with, where the previous concept of a "plan" will be represented as a set of related objects. At the risk of being obnoxious, the treatment of the beam (now "Radiation") as a first class object had been pretty well worked out before DICOM, even if it wasn't in a public standard (that's what a "field" is...). MlcTreatment sounds like it will map pretty closely to "C-Arm Photon-Electron Radiation IOD " (Supplement 175: http://dicom.nema.org/Dicom/News/January2019/docs/sups/sup175.pdf ). But it is still a Composite object (in the DICOM sense), and if and when other related objects (e.g. a Particle therapy beam of some type) need to be represented, it is likely that some subset of the MlcTreatment can actually be shared (extract the common base interface, inherit and decorate, e.g. https://refactoring.guru/design-patterns/decorator ). The Radiation objects use compositing in the definition by heavy use of "Macros". But DICOM is not intended to be an object oriented model in the classic sense. It is in fact a means of specifying Data Transfer Objects.

You already "use" Delivery. So you already have an idea of what it is used for. And in that case, spending time on (re?)defining an interface that allows it to be "used for" what it is being used for, is IMHO, the optimal place to put in design effort. Even if that is spent somewhat after the fact in the form of refactoring.

SimonBiggs commented 4 years ago

You already "use" Delivery. So you already have an idea of what it is used for. And in that case, spending time on (re?)defining an interface that allows it to be "used for" what it is being used for, is IMHO, the optimal place to put in design effort. Even if that is spent somewhat after the fact in the form of refactoring.

Yup, I agree. And if I use Composity and Facade in order to minimise its API surface area that meets my primary goal here which I realise I have left mostly unspoken.

And that goal is, I'm trying to make PyMedPhys go to version 1.0.0. I want to make sure that when that switch is flicked our API surface area is small but still useful. That way, I am hoping, that for about two years, we will be able to stay in version 1.. in a Semantic Versioning sense (https://semver.org/). I am hoping to have a clearly defined and documented release expectation that is similar to https://pandas.pydata.org/pandas-docs/version/1.0.0/development/policies.html#policies-version

...I realise there are a range of items that I have been mulling on in the back of my head regarding, what is needed to go to 1.0.0. I shall make a Road Map issue that details the "path to 1.0.0".

SimonBiggs commented 3 years ago

To swing this topic back to its original purpose (TRF -> RT DICOM conversion) and provide a little reassurance to readers. I don't plan on removing the Delivery object any time soon. It serves a purpose, in the future I may try and find a way to have that purpose be better served, but if I do, I shall likely either make a new object or adjust Delivery in a way that is backwards compatible.

So that future discussion of Delivery itself don't distract from this issue here I have spun out a new issue where I present the key pain points I have had personally while using the Delivery object in its current form:

https://github.com/pymedphys/pymedphys/issues/1249

kiyoshiyoda commented 2 years ago

Hello, Let me ask you a very low level question. Can I convert a VMAT TRF to the corresponding DICOM RT file? Kiyoshi

SimonBiggs commented 2 years ago

Hi @kiyoshiyoda,

I have taken note of your question over on the discourse channel: https://pymedphys.discourse.group/t/error-messages-during-pymedphys-installation/240/5?u=simonbiggs

And I have tapped someone on the shoulder who has been doing work with VMAT TRF -> DICOM. I'm hoping they'll jump on there and answer your question. Otherwise I'll chime in, in time.

kiyoshiyoda commented 2 years ago

Dear Simon, I am not sure if this email reply is the best but thanks a lot for your message. I really would like to know the current status. In the meantime, https://github.com/pymedphys/pymedphys/issues/1046 shows that the original DICOM RT is used as a template. I thought this might mean that the code disclosed in the above URL would accept any delivery techniques, but maybe I am wrong. Best regards, Kiyoshi

2022年10月23日(日) 7:20 Simon Biggs @.***>:

Hi @kiyoshiyoda https://github.com/kiyoshiyoda,

I have taken note of your question over on the discourse channel:

https://pymedphys.discourse.group/t/error-messages-during-pymedphys-installation/240/5?u=simonbiggs

And I have tapped someone on the shoulder who has been doing work with VMAT TRF -> DICOM. I'm hoping they'll jump on there and answer your question. Otherwise I'll chime in, in time.

— Reply to this email directly, view it on GitHub https://github.com/pymedphys/pymedphys/issues/1046#issuecomment-1287932094, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3YLSOVEFB5TMLNM6XARZSLWERSDFANCNFSM4RHAEBGQ . You are receiving this because you were mentioned.Message ID: @.***>

SimonBiggs commented 2 years ago

I really would like to know the current status

A pull request is being created based on the following: https://github.com/pymedphys/pymedphys/issues/1765

The branch where work is being undergone is at the following: https://github.com/pymedphys/pymedphys/tree/trf-robustness