Closed rdemaria closed 2 years ago
For the sake of quick inspections, I would also propose some operator overloading:
tw(...)=tw.eval(...)
tw//'regexp' =tw.pattern('regexp')
tw//('a','b') = tw.crange('a','b')
What is your opinion about using pandas for this purpose:
df = tw.dframe()
df = df[df.name.str.startswith('mb.')]
df = df.set_index('name')
# Show single expression:
print(df.eval('dx/sqrt(betx))')
# Show multiple expressions:
print(pd.concat(df.eval('dx/sqrt(betx), dy/sqrt(bety)'), axis=1))
Good question! If you count the number of lines or characters for the same task, you see the reason. During optics analysis or design it is very common to ask questions on find element names from pattern, which kickers are on , which quadrupoles exceed my beta target. If you can do it in one line it saves a lot of time. Pandas are there for more complicated tasks or for whom is more familiar with the API. These functions are for the most common interactive tasks, but not to rewrite another pandas!
Or do you you mean using pandas to implement the same shortcuts?
Good question! If you count the number of lines or characters for the same task, you see the reason.
Well, I've written down my example code with more lines than would be needed just to make it more readable. You could achieve the same with fewer keystrokes (but not quite as few as in your proposal).
During optics analysis or design it is very common to ask questions on find element names from pattern, which kickers are on , which quadrupoles exceed my beta target. If you can do it in one line it saves a lot of time. Pandas are there for more complicated tasks or for whom is more familiar with the API. These functions are for the most common interactive tasks, but not to rewrite another pandas!
I see your point that it might be nice to have more convenient access to common operations. However, note that cpymad so far has been written mainly as a as a programming interface, not as an interactive tool to help during optics analysis. The former so far has been a bit more the job of the MAD-X language itself. python inherently just has heavier syntax overhead to begin with (which is also the reason why hardly anyone is using it as their shell for example, even though there are attempts at doing that). Do you know if or how many people are using cpymad interactively for optics analysis? I'm open to somewhat shift the scope of cpymad.
I'm hesitant to add another mini-language that users will have to learn specifically for cpymad that doesn't directly apply elsewhere — when instead they could simply use pandas syntax that many will already be familiar with and that can help in many other applications as well. I'm for adding more convenient accessors, but I'm not sold on the proposal in the current state. Consider this line
tw.show(tw.pattern('mb') & tw.crange(165,180,'betx') & tw.crange('ip3','ip4'),'betx')
from your example. It looks pretty weird to me for multiple reasons: why is slicing not done with []
operators?, why does the first crange call take a column name in addition to a numeric range?, why doesn't the second call to crange take a column name? or was the final 'betx' string misplaced and should go into the parentheses before?, why is row and column selection intermingled within the same argument? In fact, I can't really figure out what crange it is supposed to do even after rereading your explanation and example and thinking about it for a couple of minutes.
Or do you you mean using pandas to implement the same shortcuts?
That could be an option, it would probably reduce the number of lines needed for implementation by a factor of 5 to 10, provide more flexbility to the user, and ensure that the expressions are compatible with pandas.
Alternatively, we should at least improve the method names. Also, the proposed show
method combines orthogonal concerns: representation and slicing. These should be kept separate, by improving repr()
one the one hand, and improving slicing on the other hand.
In that case we should make it look more similar to pandas to reduce the learning overhead:
# so, instead of:
tw.show(tw.pattern('mb') & tw.crange('ip3', 'ip4'), 'betx')
# type this:
tw.loc[tw.pattern('mb') & tw.crange('ip3', 'ip4'), 'betx']
# or this:
tw.crange('ip3', 'ip4').pattern('mb').eval('betx')
# or this:
tw.loc['ip3':'ip4'].pattern('mb').eval('betx')
# or this:
tw['betx', tw.pattern('mb') & tw.crange('ip3', 'ip4')]
# or this
tw[:, tw.pattern('mb') & tw.crange('ip3', 'ip4')].betx
or something similar.
note that instead of add method tw.show(...)
this relies on making Table.__repr__
more useful, e.g. by returning repr(self.dframe())
or self.dframe().to_string()
if pandas is installed, and something more lightweight otherwise. This doesn't introduce a new method to learn, behaves very expected (similar to pandas dataframes), and requires less typing. I think this would probably be a good idea to change either way
due to the way that cpymad currently works, the last two examples need the column name as the first slicing argument and would have to return numpy arrays, even though a pandas dataframe would be more convenient for an interactive user. We could think about whether we add a mode that always returns pandas dataframes instead of numpy arrays.
eval
method would essentially be a 3-liner shorthand to calling table.dframe().set_index('name').eval(...)
and concatenating and setting appropriate column names in the resulting dataframe, requiring very little additional code on our side
method names are still debatable.. (pattern
is awkward, I'd prefer something like query
, or just using slicing syntax)
And the following points, which I'm not sure about, it might be easier to just completely defer to pandas:
Do you know if or how many people are using cpymad interactively for optics analysis?
Difficult to say, but a handful of people in my corridor, plus people other people in other groups and labs. Each one is writing convenience functions to speed up, see also here https://github.com/search?q=cpymad&type=repositories. MAD-X language cannot grow without a significant effort, and it seems better to bend Python to become a shell for MAD-X.
The idea of the API is that you write the entire column header in a string, and you get the table, I do so many times a day with my library and I think the community will appreciate. People using pandas can already use it as you do.
__repr__
in a form of a full table for me would be too verbose, I always try to keep never __repr__
of one line to simplify working on terminals, I also need often to change the number of digits depending on the problem or quickly to save to a file; therefore I liked a method show
. This also does not oblige people to use it, if they don't want it.tw[....]
is also nice, but I did not want to slow down the simple use casestw.crange('ip3', 'ip4')
is horrible, I agree! .loc['ip3': 'ip4']
to replace crange and maybe .loc["ip3"]
to replace "pattern"? "query" is a bit too generic for me.eval
using the python eval
not to depend on pandas. The use case is more limited, I needed for show
, that it was simple to expose it.Maybe we can merge the discussion with the other proposal?
Given the discussions in #117, I propose to close the PR!
Sure!
For reference, I implemented some of the ideas and spicy syntax here:
https://github.com/rdemaria/pyoptics/blob/master/pyoptics/madx.py
t.show(t.loc[99.3:100.1:'betx','mb'],'betx dx/bety')
name betx dx/bety
mb.b11l8.b2:1 99.8114 0.022742
mb.b13r8.b2:1 99.4955 0.0267546
This implements the following methods on table to allow quick inspections:
show
: return a pretty-printed version of the table with several optionspattern
: return a bool array for row name matching a regexpcrange
: return a bool array for row with column in between two values or two nameseval
: compute a new array given an expressionPlease let me know if you have comments. If ok, I will add tests and update manual...
Examples assuming tw is a twiss table