Open andycasey opened 1 year ago
I can reproduce this issue, although I don't have an explanation:
from sdss_access import SDSSPath
path_dr17 = SDSSPath(release = "dr17")
kwd_dr17 = kwds = {'obj': '2M07591936+1734091', 'apred': 'dr17', 'field': '204+22', 'apstar': 'stars', 'prefix': 'ap', 'reduction': '2M07591936+1734091', 'telescope': 'apo25m'}
path_dr17_full0 = path_dr17.full("apStar", **kwd_dr17)
print(path_dr17_full0)
path_astra = SDSSPath(release = "sdss5")
kwd_astra = {'v_astra':"0.2.6", 'cat_id':1, 'run2d':"v6_0_9", 'apred':"1.0"}
path_astra_full = path_astra.full("mwmStar", **kwd_astra)
print(path_astra_full)
path_dr17_full1 = path_dr17.full("apStar", **kwd_dr17)
print(path_dr17_full1)
print("dr17 paths agree: %r" % (path_dr17_full0 == path_dr17_full1))
Output:
/uufs/chpc.utah.edu/common/home/sdss50/dr17/apogee/spectro/redux/dr17/stars/apo25m/204+22/apStar-dr17-2M07591936+1734091.fits
/uufs/chpc.utah.edu/common/home/sdss50/sdsswork/mwm/spectro/astra/0.2.6/v6_0_9-1.0/spectra/star/00/01/mwmStar-0.2.6-1.fits
/uufs/chpc.utah.edu/common/home/sdss50/sdsswork/mwm/apogee/spectro/redux/dr17/stars/apo25m/204+22/apStar-dr17-2M07591936+1734091.fits
dr17 paths agree: False
@havok2063 can you have a look at this? It looks like one instance is leaking into the other via the value of the tree version, so the first method call to path_dr17.full() correctly uses dr17 for the tree version, whereas the same instance (i.e. path_dr17) flips to using sdsswork in the second call to path_dr17.full().
Hmm, I think the tree instance is in the global namespace of the package but it should be re-generated on each instance of Path
. So I'll take a look as soon as I can.
I can also reproduce the error. I also found a similar error with the Access
class. I think it is an issue with the tree instance in the global namespace. I think moving the tree
instantiation to inside the Path
object will solve it. I'll work on it soon.
from sdss_access import SDSSPath
s=SDSSPath(release='dr17')
s.full('mangacube', drpver='v2_4_3', plate=8485, ifu=1901, wave='LOG')
'/Users/Brian/Work/sdss/sas/dr17/manga/spectro/redux/v2_4_3/8485/stack/manga-8485-1901-LOGCUBE.fits.gz'
s2=SDSSPath(release='sdsswork')
s2.full('mangacube', drpver='v2_4_3', plate=8485, ifu=1901, wave='LOG')
'/Users/Brian/Work/sdss/sas/mangawork/manga/spectro/redux/v2_4_3/8485/stack/manga-8485-1901-LOGCUBE.fits.gz'
s.full('mangacube', drpver='v2_4_3', plate=8485, ifu=1901, wave='LOG')
'/Users/Brian/Work/sdss/sas/mangawork/manga/spectro/redux/v2_4_3/8485/stack/manga-8485-1901-LOGCUBE.fits.gz'
So this is not as trivial as it seems. The way Tree works is to load the environment variable definitions into the user's Python os.environ
, designed to mimic the way it works with the tree modules. This way the environment variables are set once in the global namespace, and can be used throughout other scripts. If you want to change your environment between releases, you need to use the replant_tree
method on SDSSPath
.
kwds = {'obj': '2M07591936+1734091', 'apred': 'dr17', 'field': '204+22', 'apstar': 'stars', 'prefix': 'ap', 'reduction': '2M07591936+1734091', 'telescope': 'apo25m'}
p = SDSSPath("dr17")
p.full("apStar", **kwds)
'/Users/Brian/Work/sdss/sas/dr17/apogee/spectro/redux/dr17/stars/apo25m/204+22/apStar-dr17-2M07591936+1734091.fits'
p.replant_tree('sdss5')
p.full("apStar", **kwds)
'/Users/Brian/Work/sdss/sas/sdsswork/mwm/apogee/spectro/redux/dr17/stars/apo25m/204+22/apStar-dr17-2M07591936+1734091.fits'
p.replant_tree('dr17')
p.full("apStar", **kwds)
'/Users/Brian/Work/sdss/sas/dr17/apogee/spectro/redux/dr17/stars/apo25m/204+22/apStar-dr17-2M07591936+1734091.fits'
I think to redesign this would require a fundamental change to how sdss-tree
works in Python. I recommend we leave the functionality as it is currently.
Thanks for looking into this. I understand, and it would explain the behaviour.
I'm fine for leaving the functionality as-is for a while, but long term I think this is something that should be changed. If we had each SDSSPath
have their own internal Tree
, and if each Tree
expanded variables using some local dictionary of variables when evaluating paths instead of setting environment variables, then it would mean we'd be able to treat Tree
s (and SDSSPath
s) as independent. My intuition is that a single Tree
or SDSSPath
instance should give reference to everything within that instance, not alter system-wide environment variables to evaluate a path.
At the moment it fails silently by giving the user an incorrect path. In many cases it might be obvious that the path is incorrect because no file exists, but I can easily imagine situations where a user is loading data from one data release (e.g., IPL-1) and another data release (e.g., IPL-2). In that situation, both files would exist and the wrong file would be silently accessed by the path. For anything use case where data are being accessed from different releases (e.g., DR17 processing in Astra), the current behaviour means either replanting Tree
on every path evaluation, or reverting to hand-writing paths to make sure they are correct.
I'd be happy to help make changes to sdss-tree
to alter this behaviour because I think I will encounter many of these silent cases. (In fact, I had encountered many of them before but I could never pin down the weirdness for why paths were working sometimes, and missing other times).
What do you think? Do you think we should leave the behaviour as-is indefinitely, or is this something that should be changed (on some unspecified timescale)?
I think what you suggest would actually create a tighter dependency between the tree
and sdss-access
packages, rather than the opposite. SDSSPath
would be tied directly to a python instance of a specific Tree
configuration, rather than using the os
environment altogether. We have to be careful here to not discontinue the original uses of sdss_access
which was designed to be used with modules
managing the environment, in particular at Utah. Changing this behavior would cause a lot of issues in downstream packages. Using the environment variables also still allow people to specify custom locations in their local .bashrc
.
There are use cases we need to retain:
SDSSPath
is used with modules, it uses the environment variables set by the module tree version that is set. The current implementation of sdss_access
already allows switching between data releases. You don't need to replant the tree on every path evaluation, only once or twice, to generate all the paths you need in one place.
So I would suggest we leave the behavior as it is currently. However, I think we could perform a check when a path is generated to check the path matches the tree/release set and raise an exception when a mismatch occurs. We could also implement a flag on instantiation of SDSSPath
that, if set, would do what you want and do a local environment path replacement rather than an os.expandvars
. I think we could add that but make it opt in.
I'm routinely looking up paths for SDSS-5 data and SDSS-4 data.
I use a dictionary to lookup instances of
SDSSPath(...)
so that I don't have to create aSDSSPath
object every time I evaluate a path. It looks something like this:But I am encountering very weird behaviour, which I can only conclude must be because of either a strange state issue, or because
sdss_access
is changing environment variables behind the scenes.Here is a minimum reproducible example:
Here is the output:
Is this the intended behaviour?
I assumed each instance would evaluate paths for that particular release, and that evaluating one path on a different instance would not impact existing instances. Note that this happens even if the second
SDSSPath(...)
is created in a totally unrelated scope, which makes me think there is some state sharing withinsdss_access
, or environment variables being changed.