hibtc / cpymad

Cython binding to MAD-X
http://hibtc.github.io/cpymad/
Other
27 stars 18 forks source link

implement show method [WIP] #116

Closed rdemaria closed 2 years ago

rdemaria commented 2 years ago

This implements the following methods on table to allow quick inspections:

Please let me know if you have comments. If ok, I will add tests and update manual...

Examples assuming tw is a twiss table

$ tw.show('mb','dx/sqrt(betx) dy/sqrt(bety)')
name                  dx/sqrt(betx) dy/sqrt(bety)
mbas2.1r1:1              0.00607198    0.00295083
mbxw.a4r1:1               -0.003267   -0.00443431
mbxw.b4r1:1             -0.00354092   -0.00484049
mbxw.c4r1:1             -0.00389078   -0.00523874
mbxw.d4r1:1             -0.00432186    -0.0056271
mbxw.e4r1:1             -0.00483952   -0.00600381
tw.show(tw.pattern('mb')&tw.crange(165,180,'betx')&tw.crange('ip3','ip4'),'betx')
name                     betx
mb.c13r3.b1:1         165.983
mb.c17r3.b1:1         165.184
mb.c21r3.b1:1         165.172
mb.c23r3.b1:1         165.012
mb.c25r3.b1:1         165.012
mb.c27r3.b1:1         165.012
mb.c29r3.b1:1         165.012
mb.c31r3.b1:1         165.012
mb.c33r3.b1:1         165.012
mb.a34l4.b1:1         165.012
mb.a32l4.b1:1         165.012
mb.a30l4.b1:1         165.012
mb.a28l4.b1:1         165.012
mb.a26l4.b1:1         165.012
mb.a24l4.b1:1         165.012
mb.a22l4.b1:1         165.012
mb.a18l4.b1:1         165.024
mb.a14l4.b1:1         165.036
rdemaria commented 2 years ago

For the sake of quick inspections, I would also propose some operator overloading:

tw(...)=tw.eval(...)
tw//'regexp' =tw.pattern('regexp')
tw//('a','b') = tw.crange('a','b')
coldfix commented 2 years ago

What is your opinion about using pandas for this purpose:

df = tw.dframe()
df = df[df.name.str.startswith('mb.')]
df = df.set_index('name')

# Show single expression:
print(df.eval('dx/sqrt(betx))')

# Show multiple expressions:
print(pd.concat(df.eval('dx/sqrt(betx), dy/sqrt(bety)'), axis=1))
rdemaria commented 2 years ago

Good question! If you count the number of lines or characters for the same task, you see the reason. During optics analysis or design it is very common to ask questions on find element names from pattern, which kickers are on , which quadrupoles exceed my beta target. If you can do it in one line it saves a lot of time. Pandas are there for more complicated tasks or for whom is more familiar with the API. These functions are for the most common interactive tasks, but not to rewrite another pandas!

rdemaria commented 2 years ago

Or do you you mean using pandas to implement the same shortcuts?

coldfix commented 2 years ago

Good question! If you count the number of lines or characters for the same task, you see the reason.

Well, I've written down my example code with more lines than would be needed just to make it more readable. You could achieve the same with fewer keystrokes (but not quite as few as in your proposal).

During optics analysis or design it is very common to ask questions on find element names from pattern, which kickers are on , which quadrupoles exceed my beta target. If you can do it in one line it saves a lot of time. Pandas are there for more complicated tasks or for whom is more familiar with the API. These functions are for the most common interactive tasks, but not to rewrite another pandas!

I see your point that it might be nice to have more convenient access to common operations. However, note that cpymad so far has been written mainly as a as a programming interface, not as an interactive tool to help during optics analysis. The former so far has been a bit more the job of the MAD-X language itself. python inherently just has heavier syntax overhead to begin with (which is also the reason why hardly anyone is using it as their shell for example, even though there are attempts at doing that). Do you know if or how many people are using cpymad interactively for optics analysis? I'm open to somewhat shift the scope of cpymad.

I'm hesitant to add another mini-language that users will have to learn specifically for cpymad that doesn't directly apply elsewhere — when instead they could simply use pandas syntax that many will already be familiar with and that can help in many other applications as well. I'm for adding more convenient accessors, but I'm not sold on the proposal in the current state. Consider this line

tw.show(tw.pattern('mb') & tw.crange(165,180,'betx') & tw.crange('ip3','ip4'),'betx')

from your example. It looks pretty weird to me for multiple reasons: why is slicing not done with [] operators?, why does the first crange call take a column name in addition to a numeric range?, why doesn't the second call to crange take a column name? or was the final 'betx' string misplaced and should go into the parentheses before?, why is row and column selection intermingled within the same argument? In fact, I can't really figure out what crange it is supposed to do even after rereading your explanation and example and thinking about it for a couple of minutes.

Or do you you mean using pandas to implement the same shortcuts?

That could be an option, it would probably reduce the number of lines needed for implementation by a factor of 5 to 10, provide more flexbility to the user, and ensure that the expressions are compatible with pandas.

Alternatively, we should at least improve the method names. Also, the proposed show method combines orthogonal concerns: representation and slicing. These should be kept separate, by improving repr() one the one hand, and improving slicing on the other hand.

In that case we should make it look more similar to pandas to reduce the learning overhead:

# so, instead of:
tw.show(tw.pattern('mb') & tw.crange('ip3', 'ip4'), 'betx')

# type this:
tw.loc[tw.pattern('mb') & tw.crange('ip3', 'ip4'), 'betx']

# or this:
tw.crange('ip3', 'ip4').pattern('mb').eval('betx')

# or this:
tw.loc['ip3':'ip4'].pattern('mb').eval('betx')

# or this:
tw['betx', tw.pattern('mb') & tw.crange('ip3', 'ip4')]

# or this
tw[:, tw.pattern('mb') & tw.crange('ip3', 'ip4')].betx

or something similar.

And the following points, which I'm not sure about, it might be easier to just completely defer to pandas:

rdemaria commented 2 years ago

Do you know if or how many people are using cpymad interactively for optics analysis?

Difficult to say, but a handful of people in my corridor, plus people other people in other groups and labs. Each one is writing convenience functions to speed up, see also here https://github.com/search?q=cpymad&type=repositories. MAD-X language cannot grow without a significant effort, and it seems better to bend Python to become a shell for MAD-X.

The idea of the API is that you write the entire column header in a string, and you get the table, I do so many times a day with my library and I think the community will appreciate. People using pandas can already use it as you do.

Maybe we can merge the discussion with the other proposal?

rdemaria commented 2 years ago

Given the discussions in #117, I propose to close the PR!

coldfix commented 2 years ago

Sure!

rdemaria commented 2 years ago

For reference, I implemented some of the ideas and spicy syntax here:

https://github.com/rdemaria/pyoptics/blob/master/pyoptics/madx.py

t.show(t.loc[99.3:100.1:'betx','mb'],'betx dx/bety')
name                     betx   dx/bety
mb.b11l8.b2:1         99.8114  0.022742
mb.b13r8.b2:1         99.4955 0.0267546