nanograv / PINT

PINT is not TEMPO3 -- Software for high-precision pulsar timing
Other
121 stars 101 forks source link

Manipulating flags is a pain in the neck #1073

Open aarchiba opened 3 years ago

aarchiba commented 3 years ago

When users need to work with the flags on TOAs objects, they generally need to write for loops for even basic operations (set all or some to TOAs to have a particular value, for example). #1070 has some discussion that demonstrates this. Some specific headaches:

aarchiba commented 3 years ago

One suggestion for easier access: augment the indexing operator on TOAs objects so that toas['mjd'] or toas['fish'] produces an array that looks like an appropriate column. This would enable:

toas['fish', toas['mjd']>58000] = 'carp'

or

some_toas = toas[toas['fish']=='carp']
aarchiba commented 3 years ago

One suggestion for data validation: create a FlagsDict class that only accepts valid keys and values, and store that in the astropy table instead of the current plain dicts.

scottransom commented 3 years ago

I really like the idea of moving the flag setting/getting loops into the indexing operation. It is still looping (and I don't really see a good way around that given the way tables work), but at least it is nice syntactic sugar.

aarchiba commented 3 years ago

Now in #1074; open for comments. But you can do toas[10:20, "gui_jump"] = "yes".

We could make all flags into columns in the table. This would allow very uniform handling with some caveats:

aarchiba commented 3 years ago

Just to confuse the issue, if you ever did a select_toas_mask based on a flag, that function added a column to toas.table named after the flag and with its values. No attempt was made to keep this column up to date.

scottransom commented 3 years ago

I'm way behind on reading discussion for this and the other related issues and PRs, but I've seen questions/comments about not keeping columns up to date. I don't think there is any desire for that at all. One of the reasons why we liked using Astropy tables is that we could do temporary calculations and save them in the table in memory -- with no intention of ever writing them out. To me it seems like it should be to the user to keep the tables updated if they are changing things on purpose.

aarchiba commented 3 years ago

My concern is when PINT has the same information stored in two places, and users start changing things, how do we keep those two copies of the information in sync?

My answer to that is, if at all possible, don't duplicate information like that. If duplication is necessary, use python's magic properties and the like to prevent changes; failing that, to keep the versions in sync.

If you're going to quietly cache something you need to solve the cache invalidation problem.

Currently we have a situation where users occasionally need to run .setup() on models after making some kinds of change in order to update various internal structures. It's a pain; I've been bitten by it several times in the last week. But I think we're stuck with it.

Can we please not add to this problem? Specific to this issue, if we're going to make flags into columns, we must replace the flag dictionaries, not duplicate their values.