implement show method [WIP]

rdemaria commented 2 years ago

This implements the following methods on table to allow quick inspections:

show: return a pretty-printed version of the table with several options
pattern: return a bool array for row name matching a regexp
crange: return a bool array for row with column in between two values or two names
eval: compute a new array given an expression

Please let me know if you have comments. If ok, I will add tests and update manual...

Examples assuming tw is a twiss table

$ tw.show('mb','dx/sqrt(betx) dy/sqrt(bety)')
name                  dx/sqrt(betx) dy/sqrt(bety)
mbas2.1r1:1              0.00607198    0.00295083
mbxw.a4r1:1               -0.003267   -0.00443431
mbxw.b4r1:1             -0.00354092   -0.00484049
mbxw.c4r1:1             -0.00389078   -0.00523874
mbxw.d4r1:1             -0.00432186    -0.0056271
mbxw.e4r1:1             -0.00483952   -0.00600381

tw.show(tw.pattern('mb')&tw.crange(165,180,'betx')&tw.crange('ip3','ip4'),'betx')
name                     betx
mb.c13r3.b1:1         165.983
mb.c17r3.b1:1         165.184
mb.c21r3.b1:1         165.172
mb.c23r3.b1:1         165.012
mb.c25r3.b1:1         165.012
mb.c27r3.b1:1         165.012
mb.c29r3.b1:1         165.012
mb.c31r3.b1:1         165.012
mb.c33r3.b1:1         165.012
mb.a34l4.b1:1         165.012
mb.a32l4.b1:1         165.012
mb.a30l4.b1:1         165.012
mb.a28l4.b1:1         165.012
mb.a26l4.b1:1         165.012
mb.a24l4.b1:1         165.012
mb.a22l4.b1:1         165.012
mb.a18l4.b1:1         165.024
mb.a14l4.b1:1         165.036

rdemaria commented 2 years ago

For the sake of quick inspections, I would also propose some operator overloading:

tw(...)=tw.eval(...)
tw//'regexp' =tw.pattern('regexp')
tw//('a','b') = tw.crange('a','b')

coldfix commented 2 years ago

What is your opinion about using pandas for this purpose:

df = tw.dframe()
df = df[df.name.str.startswith('mb.')]
df = df.set_index('name')

# Show single expression:
print(df.eval('dx/sqrt(betx))')

# Show multiple expressions:
print(pd.concat(df.eval('dx/sqrt(betx), dy/sqrt(bety)'), axis=1))

rdemaria commented 2 years ago

Good question! If you count the number of lines or characters for the same task, you see the reason. During optics analysis or design it is very common to ask questions on find element names from pattern, which kickers are on , which quadrupoles exceed my beta target. If you can do it in one line it saves a lot of time. Pandas are there for more complicated tasks or for whom is more familiar with the API. These functions are for the most common interactive tasks, but not to rewrite another pandas!

rdemaria commented 2 years ago

Or do you you mean using pandas to implement the same shortcuts?

coldfix commented 2 years ago

Good question! If you count the number of lines or characters for the same task, you see the reason.

Well, I've written down my example code with more lines than would be needed just to make it more readable. You could achieve the same with fewer keystrokes (but not quite as few as in your proposal).

During optics analysis or design it is very common to ask questions on find element names from pattern, which kickers are on , which quadrupoles exceed my beta target. If you can do it in one line it saves a lot of time. Pandas are there for more complicated tasks or for whom is more familiar with the API. These functions are for the most common interactive tasks, but not to rewrite another pandas!

I see your point that it might be nice to have more convenient access to common operations. However, note that cpymad so far has been written mainly as a as a programming interface, not as an interactive tool to help during optics analysis. The former so far has been a bit more the job of the MAD-X language itself. python inherently just has heavier syntax overhead to begin with (which is also the reason why hardly anyone is using it as their shell for example, even though there are attempts at doing that). Do you know if or how many people are using cpymad interactively for optics analysis? I'm open to somewhat shift the scope of cpymad.

I'm hesitant to add another mini-language that users will have to learn specifically for cpymad that doesn't directly apply elsewhere — when instead they could simply use pandas syntax that many will already be familiar with and that can help in many other applications as well. I'm for adding more convenient accessors, but I'm not sold on the proposal in the current state. Consider this line

tw.show(tw.pattern('mb') & tw.crange(165,180,'betx') & tw.crange('ip3','ip4'),'betx')

from your example. It looks pretty weird to me for multiple reasons: why is slicing not done with [] operators?, why does the first crange call take a column name in addition to a numeric range?, why doesn't the second call to crange take a column name? or was the final 'betx' string misplaced and should go into the parentheses before?, why is row and column selection intermingled within the same argument? In fact, I can't really figure out what crange it is supposed to do even after rereading your explanation and example and thinking about it for a couple of minutes.

Or do you you mean using pandas to implement the same shortcuts?

That could be an option, it would probably reduce the number of lines needed for implementation by a factor of 5 to 10, provide more flexbility to the user, and ensure that the expressions are compatible with pandas.

Alternatively, we should at least improve the method names. Also, the proposed show method combines orthogonal concerns: representation and slicing. These should be kept separate, by improving repr() one the one hand, and improving slicing on the other hand.

In that case we should make it look more similar to pandas to reduce the learning overhead:

# so, instead of:
tw.show(tw.pattern('mb') & tw.crange('ip3', 'ip4'), 'betx')

# type this:
tw.loc[tw.pattern('mb') & tw.crange('ip3', 'ip4'), 'betx']

# or this:
tw.crange('ip3', 'ip4').pattern('mb').eval('betx')

# or this:
tw.loc['ip3':'ip4'].pattern('mb').eval('betx')

# or this:
tw['betx', tw.pattern('mb') & tw.crange('ip3', 'ip4')]

# or this
tw[:, tw.pattern('mb') & tw.crange('ip3', 'ip4')].betx

or something similar.

note that instead of add method tw.show(...) this relies on making Table.__repr__ more useful, e.g. by returning repr(self.dframe()) or self.dframe().to_string() if pandas is installed, and something more lightweight otherwise. This doesn't introduce a new method to learn, behaves very expected (similar to pandas dataframes), and requires less typing. I think this would probably be a good idea to change either way
due to the way that cpymad currently works, the last two examples need the column name as the first slicing argument and would have to return numpy arrays, even though a pandas dataframe would be more convenient for an interactive user. We could think about whether we add a mode that always returns pandas dataframes instead of numpy arrays.
eval method would essentially be a 3-liner shorthand to calling table.dframe().set_index('name').eval(...) and concatenating and setting appropriate column names in the resulting dataframe, requiring very little additional code on our side
method names are still debatable.. (pattern is awkward, I'd prefer something like query, or just using slicing syntax)

And the following points, which I'm not sure about, it might be easier to just completely defer to pandas:

rdemaria commented 2 years ago

Do you know if or how many people are using cpymad interactively for optics analysis?

Difficult to say, but a handful of people in my corridor, plus people other people in other groups and labs. Each one is writing convenience functions to speed up, see also here https://github.com/search?q=cpymad&type=repositories. MAD-X language cannot grow without a significant effort, and it seems better to bend Python to become a shell for MAD-X.

The idea of the API is that you write the entire column header in a string, and you get the table, I do so many times a day with my library and I think the community will appreciate. People using pandas can already use it as you do.

__repr__ in a form of a full table for me would be too verbose, I always try to keep never __repr__ of one line to simplify working on terminals, I also need often to change the number of digits depending on the problem or quickly to save to a file; therefore I liked a method show. This also does not oblige people to use it, if they don't want it.
method chaining is nice, but I did not want to reinvent pandas, which is great as it is
any idea using tw[....] is also nice, but I did not want to slow down the simple use cases
tw.crange('ip3', 'ip4') is horrible, I agree!
.loc['ip3': 'ip4'] to replace crange and maybe .loc["ip3"] to replace "pattern"? "query" is a bit too generic for me.
I prefer the eval using the python eval not to depend on pandas. The use case is more limited, I needed for show, that it was simple to expose it.

Maybe we can merge the discussion with the other proposal?

rdemaria commented 2 years ago

Given the discussions in #117, I propose to close the PR!

coldfix commented 2 years ago

Sure!

rdemaria commented 2 years ago

For reference, I implemented some of the ideas and spicy syntax here:

https://github.com/rdemaria/pyoptics/blob/master/pyoptics/madx.py

t.show(t.loc[99.3:100.1:'betx','mb'],'betx dx/bety')
name                     betx   dx/bety
mb.b11l8.b2:1         99.8114  0.022742
mb.b13r8.b2:1         99.4955 0.0267546

hibtc / cpymad

implement show method [WIP] #116