pycroscopy / sidpy

Python utilities for storing, processing, and visualizing spectroscopic and imaging data
https://pycroscopy.github.io/sidpy/
MIT License
11 stars 14 forks source link

sidpy Dimension not allowing size(1) spectral units #83

Closed rajgiriUW closed 3 years ago

rajgiriUW commented 3 years ago

In using Pycroscopy Translators, I saw that the new Dimension throws hard errors for dimensions of size (1) (which is the case if you make a single measurement at each point, e.g. a topography image)

This breaks some old Translators (the IgorIBW one, for example, as well as our lab's). I'm not sure how the new Dimension object has changed and why:

spec_desc = Dimension('arb', 'a.u.', [1])

throws TypeError: values should at least be specified as a positive integer

Is there a different way you are supposed to define spectral Dimensions for an image object now?

ssomnath commented 3 years ago

@rajgiriUW - thanks for catching this. We should definitely have a unit-test to catch this requirement

This is an interesting problem. On one hand, it does not make sense to have empty (single valued) sidpy.Dimension objects. On the other hand, USID is the odd schema that requires a 1-length dimension and the ability to specify that a certain dimension is a Dependent or Incomplete dimension. The decision for pyUSID.Dimension to extend sidpy.Dimension in a minimal manner yet add / change such functionality is what complicates the matter. We will always need two different kinds of Dimension objects because of the unique nature of USID and the need to support datasets without an N-Dimensional form. Therefore, I think that we need the fix in pyUSID.Dimension rather than sidpy.Dimension.

Thoughts?

I can put together a PR on the pyUSID side and we can decide which one to accept moving forward.

ssomnath commented 3 years ago

It turns out that at least the latest pyUSID master does allow dimensions that are unit sized.

I tested 4 separate lines with the latest version of sidpy on pypi and the latest commit of pyUSID on GitHub:master. I am seeing different behavior:

>>> sidpy.Dimension(1, name='fdfdf', units='dd')
fdfdf:  generic (dd) of size (1,)
>>> sidpy.Dimension([1], name='fdfdf', units='dd')
fdfdf:  generic (dd) of size (1,)
>>> usid.Dimension('fdfdf', 'dd', 1)
fdfdf: generic (dd) mode:DimType.DEFAULT : [0.]
>>> usid.Dimension('fdfdf', 'dd', [1])
fdfdf: generic (dd) mode:DimType.DEFAULT : [1.]

The sidpy.Dimension should have failed instead of allowing such behavior. So, as far as sidpy is concerned, @rajgiriUW has exposed a bug - why should the Dimension allow sizes < 2?

rajgiriUW commented 3 years ago

Hmm, I changed the behavior in a separate branch that has a PR, but that should not have been pushed to Master. Just looking at sidpy Dimension those should definitely be failing. I'll have to double check later.

ssomnath commented 3 years ago

I just checked on my end. This is indeed a bug for the 0.0.3 on pypi. I already fixed this on master in this commit. So, the master branches on both sidpy and pyUSID will not show this issue. Depending on how critical we think these issues are, we could consider releasing new versions of the two packages.

rajgiriUW commented 3 years ago

Why couldn't you use SidPy Dimension on a 2D image that is Y x 1 pixels? It still has two dimensions-- i.e., an image that is 128 x 1 pixel is still an image of shape (128, 1), whereas an array (128, ) is not 2D. So I guess I don't get why it has be exactly 2 or greater, rather than 1 or greater. Which is why I suggested changing to 1 in the PR submitted. I agree that value has to not be 0.

Part of my issue is that the error message doesn't make sense as the number 1 is, in fact, a positive integer.

As for the release, right now with Dimensions hard-set at >= 2, it breaks at least the IgorIBW translator. So, I'm not sure it's necessary to push that out.

rajgiriUW commented 3 years ago

The other option is to fix in PyUSID Dimension specifically, if the translators like IgorIBW instead should just call that?

ssomnath commented 3 years ago

ORNL IT decided to force some updates on my machine and killed the browser session where I was writing a response.... Here's the summary of what I was writing out:

Fundamentally, sidpy.Dimension too could have the minimum lowered to 1. Though it doesn't make much scientific sense, it makes computational sense when we need to track dimensions in some processing - I had to use singular dimensions a lot for CNMS' Loop fitting algorithm. That being said, one could work outside the context of the Dataset if need be for such cases. All this being said, I am concerned that this may complicate down-stream operations like visualization, etc. I am also concerned about multiple singular dimensions that make datasets look ugly. We will bring this up in today's hackathon.

Yes, the error message for sidpy.Dimension needs to be clarified.

rajgiriUW commented 3 years ago

Yeah, I guess the issue here is specifically the PyUSID format being used as the other dimension should not necessarily be singular except in the case of PyUSID where the image measures a single value. I get what you're saying, and I think that makes sense. So really it's in the way PyUSID extends Sidpy Dimension, as you said. Let me look at PyUSID and see if there's an easier solution instead of allowing singular dimensions?Otherwise, would it make sense to change the error in Sidpy to a warning? And in the meantime debug the Translators/PyUSID? Then at least the translator works for the time being.

I have a lunchtime meeting during the hackathons past two months for a grant renewal, that's why I haven't joined lately. :(

ssomnath commented 3 years ago

Also, two things on translators:

  1. The CNMS specific ones will continue to be Translator classes and use pyUSID.Dimension for legacy reasons.
  2. Most other translators from proprietary / instrument files will be turned into sidpy.Reader child classes instead and these will use sidpy.Dimension instead. I don't see much of a point in keeping the Translator counterpart for these readers for very long. We already have a writer for NSID and will put together a writer for USID soon. The combination of the Reader (proprietary file -> sidpy.Dataset) and the writer (sidpy.Dataset -> USID / NSID HDF5 formatted files) would effectively do the same job as the sidpy.Translator classes we have today but with more flexibility (user gets to choose what format they want to write their data into).