hyperspy / hyperspy

Multidimensional data analysis
https://hyperspy.org
GNU General Public License v3.0
513 stars 208 forks source link

record_by attributes #974

Closed dnjohnstone closed 8 years ago

dnjohnstone commented 8 years ago

The code base currently includes a "record_by" attribute that can take values "spectrum", "image" or " ", and is used throughout the internal workings to switch between Spectrum, Image and Signal classes. As discussed in #963 this is a legacy 'feature' that inherits from the dogma of electron microscopy packages that allowed acquisition "spectrum-by-spectrum" (EELS map) or "image-by-image" (EFTEM series).

This terminology is not however particularly general and I think it would be preferable if we have more specific "recorded_as" attributes for specialised sub-classes... This might be more self-documenting and take values like "spectrum image", "image series", "tlit series" etc.

Opinions on the best general solution would be gladly received.

francisco-dlp commented 8 years ago

Now that @dnjohnstone is working on #963, it is indeed the right time to address this and related issues such as #976. It should be achievable for 1.0, so I've added it to that milestone.

to266 commented 8 years ago

...Doesn't the attribute only show the number of dimensions in what we call "signal" space? Then the only (albeit a large one) problem is that people might confuse "signal space" (as in the 3 in < 2 | 3 >) and the actual signal.... We could, of course, instead store the number navigation space dimensions - which is much less confusing, however not "intrinsic" to the type of signal.

Actually now that I think of it, I would prefer storing the confusing "signal space dimensions". I guess the confusion would only come because at some point there was the need to separate just "plain data" and our "signal", which is essentially "data + tools wrapper" - hence the term was born and had to be used. Especially to stress the idea that you can access the data directly if you know what you're doing and want speed.

dnjohnstone commented 8 years ago

I agree that the thing that this attribute has come to store is essentially the number of axes/dimensions in the 'signal space'... and I think that this is the right thing to store rather than navigation.

To me the attribute could be more self documenting and reflect the fact that at this point it doesn't really matter how you "recorded" the data it's just a choice about how you 'slice the data' in analysis, or in other words which dimensions you take as signal dimensions.

What I would propose is that in this case the attribute should be "signal_dimensions" potentially allowing any integer value although, to be fair, for the time being it's hard to imagine each individual measurement being made on anything other than a 1D or 2D detector... so perhaps it's a non issue. If that were the case though, it would make me ask - why do we have this attribute at all?

In the end I guess it comes down to whether we think that the average user will gain more from the current terminology, which retains some link to how they probably did the experiment or rather more from being a little irreverent to that and focusing on which axes you're doing your analysis on (i.e. which you set as signal)

I'm really not sure what is 'right' here.

dnjohnstone commented 8 years ago

I also have no idea whether many people use the possibility to set this attribute when they use the code - personally I always set my signal axes directly, but that might just be me...

francisco-dlp commented 8 years ago

The solution to this problem is actually very simple: store the navigate attribute of the axes in the hdf5 file (currently it is not stored as the configuration of the axes is inferred from record_by) and get rid of record_by.

dnjohnstone commented 8 years ago

well that would be fine by me if it doesn't cause any problems... I'm all for getting rid of record_by as it seems like it should be unnecessary but I wasn't sure if I just couldn't see why it was so central to much of the code.

Can anyone come up with an advantage to having record_by at this point before we start removing it?

francisco-dlp commented 8 years ago

Immplemented in #1127