Closed SimonBiggs closed 3 years ago
Also, I'd like to add that the surface area and the many implicit APIs of the Delivery
object has bothered me. It has such a huge surface area. I am open to many suggestions regarding it. Part of me wants to make it an object that is only used internally within PyMedPhys and then expose a sort of "wrapper object" where we can tightly constrain the API.
Anyway. I just wanted to flag that Delivery
is certainly not set in stone. Even internally, it being a namedtuple
doesn't feel so appropriate either. I feel like frozenmap
as discussed over at https://github.com/pymedphys/pymedphys/issues/1011#issuecomment-675156988 would be a far more appropriate base class.
I had also been quite partial in the past to the following library:
http://xarray.pydata.org/en/stable/
One thing that really bugs me about the current Delivery
object is that the axes definitions of the MLC and Jaw arrays are not immediately obvious. I really like how in xarray the axes labels are closely tied to the array contents.
I'd even be open refactoring the intermediary object away from being this Delivery
thing, and instead have DICOM RT plan be the intermediary. And then just have a range of functions that can convert to and from DICOM RT plan, and do stuff when passed a DICOM RT plan.
To be honest, that fixes almost all of my concerns. No longer are we exposing an object with a large API, instead just a set of functions that act upon a pydicom.Dataset
that is an RT plan. Users can just convert to and from DICOM RT plan for all their needs.
The biggest downside is that DICOM RT plan is not allowed to be used as an input to a function that is wrapped in functools.lrucache
... that can be a pain as that tool is really useful at allowing costless functions. But maybe that's a price worth paying...
What, in theory, does Delivery offer compared to DICOM RT Plan? Does it allow for more information? A richer data model (relationships between internal objects)? Ease of interaction (not just a "dataclass")? If not, then I'd encourage moving away from a 3rd model (there is the TRF data model, and there is the DICOM model, and the DICOM model is the accepted standard... so unless there is something that Delivery offers that DICOM does not... it's just more to deal with). Having said that, there is 2nd Gen DICOM-RT (radiations and radiation records), so if having an intermediate model simplifies having a different target (2nd gen rather than RT Plan), that might be part of its value proposition. Note that I said DICOM model and not explicitly RT Plan, because the other object, that more closely maps to TRF, conceptually, is RT Beams Treatment Record. Using RT Beams Treatment Record would create the additional requirement of having an RT Beams Treatment Record -> RT Plan conversion (what I refer to as "Plan as Delivered"). But that in itself is quite useful for systems that already provide an RT Beams Treatment Record rather than a proprietary/non-standard record (lower case), such as an IHE-RO Treatment Delivery Workflow profile compliant system (there are also vendors who provide a DICOM RT (Ion) Beams Treatment Record without the interface being IHE-RO TDW).
cacheing is a performance optimisation. Premature optimisation is not a good thing. So unless there is already a clear case to be made that the performance is problematic or would be problematic, being able to use lrucache is not sufficient to motivate a design decision. One should be able to wait until it's clear where a performance issue lies, and focus explicitly on that and ensure it's inputs are in the form of something that are amenable to working with lrucache.
xarray looks promising, and being able to use it inside Delivery to make operating on the Delivery object (hierarchy) might be motivation to have Delivery as a 3rd data model (ease of interaction).
cacheing is a performance optimisation. Premature optimisation is not a good thing.
Yup :), I'll scrub it from my head and we can deal with that if need be. Thanks for snapping me out of it :).
xarray looks promising, and being able to use it inside Delivery to make operating on the Delivery object (hierarchy) might be motivation to have Delivery as a 3rd data model (ease of interaction).
Yup. I thought about it a bit more over night and this morning. That ease of interaction point I think is key. Having a nice xarray Dataset
that interacts nicely and by default with tools such as matplotlib/pandas/numpy is worth the cost of having an internal intermediary.
Also, there are quite a range of input/output targets in this plan and deliver space. Currently the following "input outputs" being targeted are:
Currently the design is a tuple of the following layout:
Using the terminology as defined with xarray (http://xarray.pydata.org/en/stable/terminology.html) the whole delivery object can be replaced by a standardised way of defining an xarray Dataset
. Within the Dataset
the Cumulative MU as defined above would be a Dimension coordinate
(a dimension label). One difference to the cumulative MU definition in the xarray case is that a new requirement would be that MU must be strictly increasing (consecutive MU values must not be the same). Gantry angle, collimator, (and potentially table in the future), would be DataArray
s where their dimension label is the MU Dimension coordinate
. Potentially a jaw could be generalised into a 1 pair MLC. The dimension of the leaf pairs and banks would be generalised to Pair Index
and Bank Index
. This would be a three dimensional DataArray
defining the current travel position of each leaf/jaw, dimensions being Cumulative MU
, [id] Bank Index
, [id] Pair Index
. The [id]
label needs to be unique for each beam limiting device. To extract more information out of each Jaw/MLC bank's indices, there can be two optional defining features that give more info. A leaf width
DataArray
which uses a given MLC/Jaw's [id] Pair Index
as its Dimension coordinate
and then provides an array of leaf widths. There can then also be a collimator bank position which provides the displacement from the CAX/CRA to the leaf centre, it would also use that same [id] Pair Index
as its Dimension coordinate
. (Not so sold on the hierarchy here...).
[EDIT]
Actually, instead of using Cumulative MU as the dimension, instead, use separate Dimension coordinate
throughout that can optionally be either a time step
or a control point index
. That frees up Cumulative MU to not need to be strictly increasing, and its a far more sensible "dimension". Then Cumulative MU becomes a DataArray
.
Also, it would be quite helpful to have a create_delivery_xarray_dataset()
function, which can be passed these items in their raw format and it will create this specified xarray.Dataset
.
[/EDIT]
By have an intermediary users are asked to first convert an input to the intermediary and then they are able to convert to any of the output formats. Using the same justification as the place for a language server:
However, by exposing the intermediary, we are expose an object which we may want to be free to adjust in the future. Having the object itself be part of our API significantly undermines our internal flexibility.
One idea around that is to expose a convert
function that internally calls our intermediary object but does not return the object itself, just its result. That then frees us to go to town playing around with the object, adjusting it to suite our needs. Maybe one day, if we are relatively sure that we would like to stabilise the object itself, on that day (and only then) we could expose it for other libraries to use.
I envisage a convert
function to work a bit like how pandoc's API works:
Can't think of a tidy API just yet though (which isn't a vote of confidence for the concept).
That frees up Cumulative MU to not need to be strictly increasing, and its a far more sensible "dimension".
That's critical. Step and Shoot has Cumulative MU that looks like [0, f, f, g, g, final]. Common in conventional RT, ubiquitous in Particle/Ion/Proton therapy.
The pandoc approach looks useful. It precludes operating on the intermediate object outside of conversion. So if you need to filter something, you need to either filter the input or filter the output, or always apply the filter as part of conversion (i.e. it's part of the library, not accessible to an application programmer/user). That's not a problem, it's a consequence. A common filter is "data cleaning".
The API that comes to my mind is
delivery_convert --input_type MyInputType --output_type MyOutputType -i my_input -o my_output
where -i and -o values default to stdin and stdout,
(because I like *nix style pipelines, and you can do some fun stuff with co-processing in Ksh and Bash)
but specification of the input type and output type are mandatory and have to match the known set of filters, although one could have separate configuration for plugins
That's critical. Step and Shoot has Cumulative MU that looks like [0, f, f, g, g, final].
Yeah, I did know that regarding step and shoot, and I did think about it... not sure why I disregarded it.
delivery_convert --input_type MyInputType --output_type MyOutputType -i my_input -o my_output
Yup, I do like that.
What about a Python API? When I make an API I like to see if I can find one already in the wild. I can't think of one that implements this pattern though...
The first question should be "Is there a pattern describing the behaviour or action intended" (pattern in the GoF / PoSA sense). Straight up, this is translation, so the following is worth looking at.... http://wiki.c2.com/?TranslatorPattern And this is usually discussed as an alternative to Visitor. I have used Visitor for translation quite successfully. Many times. But your input objects have to accept a visitor (if they don't already, it might be a bit of work).
Stepping back a bit: This is creational. Factory comes to mind, probably Abstract Factory. Builder is an option if there are internal items that are built up separately and assembled (beams, devices, fraction groups, patient demographics) But it could also be Command with execute() being an implementation of "populate_from()" followed by "export_to"), or the creation of the Command object would be based on the input, and execute() is export(). And it could be a Factory that creates the Command object (whose class depends on the intended output type).
As an aside, the "strategy" approach now used in anonymisation is more or less a visitor because we are passing it in, and not subclassing.
Once your design approach is clear, then the API to it should be straight forward
I guess what I mean is, in the wild in Python the need for "conversion" between data representations always seems to have one kind of API. People tend to expect it. I want people to be able to lean on their intuition when they're using pymedphys. If there is a given task they are doing, I want the syntax to be as similar as possible as other APIs out there.
Here is an example of the "conversion" API for pandas:
pandas.DataFrame.from_dict
pandas.DataFrame.from_records
pandas.DataFrame.to_parquet
pandas.DataFrame.to_pickle
pandas.DataFrame.to_csv
pandas.DataFrame.to_hdf
pandas.DataFrame.to_sql
pandas.DataFrame.to_dict
pandas.DataFrame.to_excel
pandas.DataFrame.to_json
pandas.DataFrame.to_html
pandas.DataFrame.to_feather
pandas.DataFrame.to_latex
pandas.DataFrame.to_stata
pandas.DataFrame.to_gbq
pandas.DataFrame.to_records
pandas.DataFrame.to_string
pandas.DataFrame.to_clipboard
pandas.DataFrame.to_markdown
For xarray:
xarray.Dataset.from_dataframe
xarray.Dataset.from_dict
xarray.Dataset.to_netcdf
xarray.Dataset.to_zarr
xarray.Dataset.to_array
xarray.Dataset.to_dataframe
xarray.Dataset.to_dask_dataframe
xarray.Dataset.to_dict
So that's what informed the current Delivery
API, which looks like the following:
pymedphys.Delivery.from_dicom
pymedphys.Delivery.from_icom
pymedphys.Delivery.from_logfile
pymedphys.Delivery.from_monaco
pymedphys.Delivery.from_mosaiq
pymedphys.Delivery.to_dicom
pymedphys.Delivery.mudensity()
But, there is an assumption there. In the DataFrame
and Dataset
cases above, that object is something that is useful to the user. Delivery
here, isn't that useful to the user currently. It is quite opaque. People rarely want to make a Delivery
object, they actually just care about the bits around it.
And, at the end of the day, the API of the internal function/class that we find most useful, does not need to match the API of the exposed function/class. In fact, separating these two has the huge benefit of allowing our internal API to be flexible and have a large surface area and our external API to be rigid with a small surface area. A way this can be achieved may be via Composition (https://en.wikipedia.org/wiki/Composition_over_inheritance), wrapping up our internal object, attached to a private parameter within the exposed object. That way the exposed function/class can be a bit of a shell object that doesn't do much except be purely an API intermediary.
I do think there is a lot of value in knuckling down an API that is as simple as possible (and no simpler), as well as having it be unified with the API patterns of the rest of the pymedphys library and the greater scientific Python ecosystem. I don't think the API needs to have any dependency on the internal implementation.
Anyway... I think I need to mull on it more. I'm certainly liking the xarray version... and maybe, that object might actually be an object in and of itself that is useful to users. Potentially we could have a Treatment object (better than delivery, because its not always delivered, and aligns better with the words "RT Beams Treatment Record" without differentiating whether it is a record or a plan as it can be either).
The treatment object could have within itself that xarray representation, and then have the following conversion methods:
pymedphys.Treatment.from_xarray_dataset # set the internal xarray Dataset and run structure validation
pymedphys.Treatment.from_dataframe # use xarray.Dataset.from_dataframe and run structure validation
pymedphys.Treatment.from_dict # use xarray.Dataset.from_dict and run structure validation
pymedphys.Treatment.from_dicom # Either 'RT Beams Treatment Record' or 'RT Plan'
pymedphys.Treatment.from_icom
pymedphys.Treatment.from_trf
pymedphys.Treatment.from_csv
pymedphys.Treatment.from_toml
pymedphys.Treatment.from_monaco
pymedphys.Treatment.from_mosaiq
pymedphys.Treatment.to_dicom # Will be 'RT Beams Treatment Record' if axis is timestep
# 'RT Plan' if it is control point
pymedphys.Treatment.to_xarray_dataset # just output the internal xarray Dataset
pymedphys.Treatment.to_dataframe # use xarray.Dataset.to_dataframe
pymedphys.Treatment.to_dict # use xarray.Dataset.to_dict
pymedphys.Treatment.to_csv
pymedphys.Treatment.to_toml
The following xarray.DataArray
s as parameters for easy access by users:
pymedphys.Treatment.monitor_units
pymedphys.Treatment.gantry_angles
pymedphys.Treatment.collimator_angles
pymedphys.Treatment.mlc_positions
pymedphys.Treatment.jaw_positions
And the following method for "fluence" calculation:
pymedphys.Treatment.mudensity()
Also, to be able to transition from the current Delivery
object internally to a new xarray based Treatment
object there could be the following internal transformations:
pymedphys.Treatment._from_delivery
pymedphys.Treatment._to_delivery
So, I've given up on the idea of the Treatment
object, or the Delivery
object. The fact that it won't map to ion therapy was a nail in the coffin for me.
I think the only API we should expose regarding converting from one thing to another is:
pymedphys.convert(input_item: Any, input_type: str, output_type: str, **kwargs) -> Any
# where **kwargs can have things like, `dicom_template` and other optional requirements.
That way, you can use whatever you like to make the transformation internally. And at the end of the day it won't be exposed.
I'll deprecate Delivery
, create the above convert
function to replace it, and after a few versions I'll remove Delivery
from the API.
I'll deprecate
Delivery
, create the aboveconvert
function to replace it, and after a few versions I'll removeDelivery
from the API.
Today I made a little demo where I made use of the Delivery API:
-- https://nbviewer.jupyter.org/url/simonbiggs.net/mosaiq-fields.ipynb
And, point of the story... The Delivery object isn't perfect, but it does make me want to redact the following comment:
But, there is an assumption there. In the
DataFrame
andDataset
cases above, that object is something that is useful to the user.Delivery
here, isn't that useful to the user currently. It is quite opaque. People rarely want to make aDelivery
object, they actually just care about the bits around it.
At the end of the day Delivery
is useful. I find it useful. In its current form it really can only support a single pair of jaws, but, that's useful nonetheless. Yes, it might not be able to encompass ion beams. But, that's okay. Maybe its name could be something like MlcTreatment
instead...
It still is opaque, and I do believe that the xarray format detailed above would be more self documenting for the user. I think, the path forward should actually be to make a new object, built on top of xarray
that hides within it, by composition, the current Delivery
object. And I can then deprecate Delivery
, but with the aim to replace it with a very similar object called MlcTreatment
.
I think, the path forward should actually be to make a new object, built on top of xarray that hides within it, by composition, the current Delivery object. And I can then deprecate Delivery, but with the aim to replace it with a very similar object called MlcTreatment.
Composite and Facade. In DICOM RT Plan and RT Ion Plan are different objects. RT Plan supports Brachy as well as Teletherapy, and frankly, IMHO, that doesn't really work out so well. But it did ensure that storage of an object for Brachy would be the same as for Teletherapy, even if the object could not be used by many applications unless they were explicitly aware of Brachy content.
In 2nd Gen DICOM-RT, the beams got broken out as first class objects ("Radiation"), and that should make things easier to work with, where the previous concept of a "plan" will be represented as a set of related objects. At the risk of being obnoxious, the treatment of the beam (now "Radiation") as a first class object had been pretty well worked out before DICOM, even if it wasn't in a public standard (that's what a "field" is...). MlcTreatment sounds like it will map pretty closely to "C-Arm Photon-Electron Radiation IOD " (Supplement 175: http://dicom.nema.org/Dicom/News/January2019/docs/sups/sup175.pdf ). But it is still a Composite object (in the DICOM sense), and if and when other related objects (e.g. a Particle therapy beam of some type) need to be represented, it is likely that some subset of the MlcTreatment can actually be shared (extract the common base interface, inherit and decorate, e.g. https://refactoring.guru/design-patterns/decorator ). The Radiation objects use compositing in the definition by heavy use of "Macros". But DICOM is not intended to be an object oriented model in the classic sense. It is in fact a means of specifying Data Transfer Objects.
You already "use" Delivery. So you already have an idea of what it is used for. And in that case, spending time on (re?)defining an interface that allows it to be "used for" what it is being used for, is IMHO, the optimal place to put in design effort. Even if that is spent somewhat after the fact in the form of refactoring.
You already "use" Delivery. So you already have an idea of what it is used for. And in that case, spending time on (re?)defining an interface that allows it to be "used for" what it is being used for, is IMHO, the optimal place to put in design effort. Even if that is spent somewhat after the fact in the form of refactoring.
Yup, I agree. And if I use Composity and Facade
in order to minimise its API surface area that meets my primary goal here which I realise I have left mostly unspoken.
And that goal is, I'm trying to make PyMedPhys go to version 1.0.0. I want to make sure that when that switch is flicked our API surface area is small but still useful. That way, I am hoping, that for about two years, we will be able to stay in version 1.. in a Semantic Versioning sense (https://semver.org/). I am hoping to have a clearly defined and documented release expectation that is similar to https://pandas.pydata.org/pandas-docs/version/1.0.0/development/policies.html#policies-version
...I realise there are a range of items that I have been mulling on in the back of my head regarding, what is needed to go to 1.0.0. I shall make a Road Map issue that details the "path to 1.0.0".
To swing this topic back to its original purpose (TRF -> RT DICOM conversion) and provide a little reassurance to readers. I don't plan on removing the Delivery object any time soon. It serves a purpose, in the future I may try and find a way to have that purpose be better served, but if I do, I shall likely either make a new object or adjust Delivery in a way that is backwards compatible.
So that future discussion of Delivery itself don't distract from this issue here I have spun out a new issue where I present the key pain points I have had personally while using the Delivery object in its current form:
Hello, Let me ask you a very low level question. Can I convert a VMAT TRF to the corresponding DICOM RT file? Kiyoshi
Hi @kiyoshiyoda,
I have taken note of your question over on the discourse channel: https://pymedphys.discourse.group/t/error-messages-during-pymedphys-installation/240/5?u=simonbiggs
And I have tapped someone on the shoulder who has been doing work with VMAT TRF -> DICOM. I'm hoping they'll jump on there and answer your question. Otherwise I'll chime in, in time.
Dear Simon, I am not sure if this email reply is the best but thanks a lot for your message. I really would like to know the current status. In the meantime, https://github.com/pymedphys/pymedphys/issues/1046 shows that the original DICOM RT is used as a template. I thought this might mean that the code disclosed in the above URL would accept any delivery techniques, but maybe I am wrong. Best regards, Kiyoshi
2022年10月23日(日) 7:20 Simon Biggs @.***>:
Hi @kiyoshiyoda https://github.com/kiyoshiyoda,
I have taken note of your question over on the discourse channel:
https://pymedphys.discourse.group/t/error-messages-during-pymedphys-installation/240/5?u=simonbiggs
And I have tapped someone on the shoulder who has been doing work with VMAT TRF -> DICOM. I'm hoping they'll jump on there and answer your question. Otherwise I'll chime in, in time.
— Reply to this email directly, view it on GitHub https://github.com/pymedphys/pymedphys/issues/1046#issuecomment-1287932094, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3YLSOVEFB5TMLNM6XARZSLWERSDFANCNFSM4RHAEBGQ . You are receiving this because you were mentioned.Message ID: @.***>
I really would like to know the current status
A pull request is being created based on the following: https://github.com/pymedphys/pymedphys/issues/1765
The branch where work is being undergone is at the following: https://github.com/pymedphys/pymedphys/tree/trf-robustness
The key API for TRF -> RT DICOM plan is through the
pymedphys.Delivery
object:https://github.com/pymedphys/pymedphys/blob/299070597f8251e08b7bf181380f6a5a87ef2726/pymedphys/__init__.py#L5
There is a method
pymedphys.Delivery.from_logfile
(which I should promptly renamefrom_trf
to avoid ambiguity). It is defined here:https://github.com/pymedphys/pymedphys/blob/299070597f8251e08b7bf181380f6a5a87ef2726/pymedphys/_trf/decode/delivery.py#L31-L60
Importantly, delivery itself is built on top of
collections.namedtuple
and consists of five parameters["monitor_units", "gantry", "collimator", "mlc", "jaw"]
:https://github.com/pymedphys/pymedphys/blob/299070597f8251e08b7bf181380f6a5a87ef2726/pymedphys/_base/delivery.py#L32-L34
Once a TRF file has been converted into a
Delivery
object, it can then proceed to be converted into a DICOM RT plan file. One requirement is that the original DICOM RT plan file is provided as a template. This is achieved using theDelivery.to_dicom
method, which is defined here:https://github.com/pymedphys/pymedphys/blob/299070597f8251e08b7bf181380f6a5a87ef2726/pymedphys/_dicom/delivery/core.py#L79-L108
Therefore, the process of converting a trf file to a DICOM plan file looks a bit like this:
To see the "round trip test" that utilises that data downloaded above see:
https://github.com/pymedphys/pymedphys/blob/299070597f8251e08b7bf181380f6a5a87ef2726/pymedphys/tests/delivery/test_deliverydata_trf_dicom_round_trip.py#L113-L135
So essentially, to solve a wider range of DICOM RT plan types, what is required is to actually work with the
Delivery
object, and and improve the currentto_dicom
method. The TRF --> Delivery pathway is currently quite robust. The Delivery --> RT Plan DICOM feels like a giant ball of sticky tape ... :(.I can follow up this post with a deep dive into the Delivery --> RT Plan DICOM implementation if that helps also.