Proposal: specialize tables

rdemaria commented 2 years ago

Methods like rmat, kvec, etc (see https://github.com/hibtc/cpymad/blob/master/src/cpymad/madx.py#L1247) are very useful but specific to Twiss tables. Additional very useful methods could be added to Twiss tables, as well as for aperture or survey or tracking.

I propose to create specialized versions of Table such as TwissTable, ApertureTable, etc.. to return depending on the type and add specific methods there.

coldfix commented 2 years ago

How do we know in advance what kind of table it is if it has a different name and the table is returned from madx.table.NAME? If that requires any kind of column availability checking, I would prefer not to go down that road. But we can add more specialized methods and some documentation that these methods are only available for certain kinds of tables. The presence of unused methods does not impact performance.

rdemaria commented 2 years ago

I would use the table.type field see https://github.com/MethodicalAcceleratorDesign/MAD-X/blob/master/src/mad_table.h#L18 to differentiate, and I would add a method or property to set the type for user-defined table, for instance madx.table.mytwiss.as_aperture_table which could be useful to use aperture related methods that happen to have apertype, aper_1 etc.. columns.

coldfix commented 2 years ago

I would use the table.type field see https://github.com/MethodicalAcceleratorDesign/MAD-X/blob/master/src/mad_table.h#L18 to differentiate, and I would add a method or property to set the type for user-defined table, for instance madx.table.mytwiss.as_aperture_table which could be useful to use aperture related methods that happen to have apertype, aper_1 etc.. columns.

ok. So, I'm still not sure what the benefit of doing subclasses here is. Also consider that in theory users could perform two different operations, download the tables, join them together, save them to a file, and load that file as a table again in MAD-X. Not sure if anybody does that, but in that case the table would be e.g. both a survey table and a twiss table. This kind of possibilities doesn't match the inheritance pattern. Or they could cut out specific columns from a twiss table that would otherwise be needed to call those twiss-specific methods.

So in my opinion there are two options:

the consenting adults approach: leave it as is, one big table class, with specialized methods for different kinds of applications, but document that these are available only under certain conditions, and make the user responsible for calling only methods that can be called (otherwise they just get some kind KeyError or similar which isn't too bad either). We could also add some shorthand to get more information about the table, e.g. Table.info() that prints type, columns, number of rows, etc. IMO, this is the preferred option, because it has zero overhead, no additional complexity, and leaves the most flexibility to the user.
add specialized accessor properties like in pandas where you have DataFrame.str.startswith or DataFrame.dt.days. That makes the namespace look a bit cleaner, but I don't see any tangible benefits, considering that we will have only relatively few methods (or what do you have in mind?), and it makes it a bit more cumbersome to use and is of course an API break.

rdemaria commented 2 years ago

I have in mind a fairly large number of methods. This clear shifts the scope of cpymad from bare communicator to battery-included tool, that, I think, will be beneficial to avoid the community to reinvent wheels and to attract more users and developers.

In this context, I am worried about the namespace pollution in your first proposal and the API breaking in your second proposal. If we accept API breaking, the second one would be my favorite, which gives discoverability and flexibility.

To get a feeling of the methods, you could have a look at some examples here:

Survey: https://github.com/rdemaria/pyoptics/blob/master/pyoptics/optics.py#L827 https://github.com/rdemaria/pyoptics/blob/master/pyoptics/optics.py#L830

Aperture: https://github.com/rdemaria/pyoptics/blob/master/pyoptics/strtable.py#L436

but I have more in mind...

coldfix commented 2 years ago

Currently, cpymad is mostly trying to expose the bare functionality of MAD-X in a pythonic manner, with some utility functions to directly get arrays and matrices from multiple components. On the other hand, the changes you suggest seem to be going into the direction of adding physics calculations on the python side. I agree that it will be helpful for some users to have a tool with more batteries included. On the other hand, this will make it harder to learn and filter what's relevant for those users who are mainly looking for a programming interface to MAD-X itself, not as an additional physics library.

The more I think about this, the more I start to believe that it might be better for you to maintain a separate utility that aims at interactive tasks, at least until the API is stable and mainstream enough to count as part of MAD-X itself. It could e.g. be a relatively lightweight wrapper on top of cpymad that helps with downstream optics analysis (like I was trying to start with madgui, but was doing too much GUI things too little useful functionality). We can merge anything here that is needed to make that task easier and to expose more of the MAD-X C-API, and make performance improvements, and we can also definitely reference this tool in the README and documentation. I can also help if you have any questions regarding python packaging etc. This way, you have free reign to design the interactive API while keeping cpymad lightweight (in terms of API, maintainability, and dependencies).

@fsoubelet You seem to have an opinion on these things. Would you care to join the discussion? Other thoughts welcome as well.

fsoubelet commented 2 years ago

I merely have an opinion as a user.

Currently, cpymad is mostly trying to expose the bare functionality of MAD-X in a pythonic manner, with some utility functions to directly get arrays and matrices from multiple components.

I wholeheartedly agree that cpymad is and should stay no more than just that: a pythonic link to MAD-X's internals and functionality. The possibility to get tables as pandas DataFrames or many internals as dicts is already a one-line access to many powerful APIs in itself. Saving lines of code is, in my opinion, irrelevant.

The more I think about this, the more I start to believe that it might be better for you to maintain a separate utility that aims at interactive tasks, at least until the API is stable and mainstream enough to count as part of MAD-X itself.

I am also of the opinion that the better option is to leave cpymad as is and create a (potentially official?) package for Pythonic functionality that would do the things suggested by Riccardo. I have and maintain myself one of those toolkits and know of several people either using my toolkit or maintaining theirs, so there may be enthusiasm for a common effort.

@rdemaria this could be a good topic for tomorrow's ABP computing meeting, maybe we could gather known users' opinions there first.

rdemaria commented 2 years ago

Excellent idea to discuss in the ABP meeting. @coldfix you are also very welcome to join (I send you the zoom link in case). Let's explore the option of the additional package!

coldfix commented 2 years ago

Ok, so I close this as well I guess.

hibtc / cpymad

Proposal: specialize tables #117