spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
558 stars 164 forks source link

Feature Request: Separate out data models into its own package #3132

Closed pllim closed 2 months ago

pllim commented 5 years ago

I thought I opened an issue about it but I cannot find it.

I need the data models to read jwst ASDF files but I don't want to ever run jwst pipeline. When I install jwst to just get to the data models, I am not thrilled on how many other dependencies it is installing, which I am pretty sure I don't need to just for reading the data. Would be nice if data models is its own package. It probably only would pull gwcs and pyyaml with it, instead of these plethora of packages?

See also:

pllim commented 5 years ago

Another motivation is that I don't want to install jwst natively on Windows because it is not guaranteed to work on Windows.

pllim commented 5 years ago

And in case you are wondering, I tried to install jwst on Windows 10 and failed.

    utest_cdrizzle.c
    c:\...\drizzle\drizzle\src\tests\fct.h(213): fatal error C1083: Cannot open include file: 'unistd.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.14.26428\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2

    ----------------------------------------
Command "C:\...\Miniconda3\envs\py37\python.exe -u -c "import setuptools, tokenize;__file__='C:\\...\\AppData\\Local\\Temp\\pip-install-5xl1o1iu\\drizzle\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\...\AppData\Local\Temp\pip-record-rkb03kjc\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\...\AppData\Local\Temp\pip-install-5xl1o1iu\drizzle\

As a result, I am unable to read JWST Level 2 ASDF-in-FITS portion of the data on Windows. Even when @drdavella told me how to load the ASDF part without jwst, it fails with errors about TaggedDict without jwst installed.

jhunkeler commented 5 years ago

That might be an easy fix in drizzle...

/* Platform indepedent pipe functions. TODO: Look to figure this out in a way
that follows the ISO C++ conformant naming convention. */
#if defined(WIN32)
#    include <io.h>
#    include <fcntl.h>
#    define _fct_pipe(_PFDS_) \
        _pipe((_PFDS_), FCT_PIPE_RESERVE_BYTES_DEFAULT, _O_TEXT)
#    define _fct_dup  _dup
#    define _fct_dup2 _dup2
#    define _fct_close _close
#    define _fct_read  _read
/* Until I can figure a better way to do this, rely on magic numbers. */
#    define STDOUT_FILENO 1
#    define STDERR_FILENO 2
#else
#    include <unistd.h>
#    define _fct_pipe  pipe
#    define _fct_dup   dup
#    define _fct_dup2  dup2
#    define _fct_close close
#    define _fct_read  read
#endif /* WIN32 */

We just need to tell it to define WIN32 on Windows systems.

pllim commented 5 years ago

The bigger question is... Does jwst and all its dependencies want to start supporting Windows and have, say, Appveyor CI? If not, then it is a better investment to break datamodels out because then only a subset of those dependencies need to support Windows.

sosey commented 5 years ago

Last I knew, none of the ST software is officially supporting windows. Where it's trivial to do so, some packages have, but work in that direction should probably be approved if it's meant to be maintained.

Creating a more generic datamodels package, along with schemas, from which other libraries can build upon, is a larger cross-mission discussion that's already being discussed, at the moment with @nden, @perrygreenfield, @drdavella and myself.

With the above, I don't see how that would divorce someone from installing the JWST pipeline. You may not know what dependencies you need to 'just read the data', the GWCS and use of astropy and other custom functions inside the jwst pipeline may be necessary. I think it's reasonable to expect users that want to read and work with the JWST data, to install the JWST library. That doesn't mean they have to run the pipeline, but the library contains tools that allow things other than pure pipeline runs. I would expect the same from any other mission library, and I wouldn't want to install a datamodels package that had to contain all the methods and dependencies to read any missions dataset.

jdavies-st commented 5 years ago

I agree that pulling out datamodels and stpipe into their own packages (or a combined package) would be desirable, mostly from the point of view of using them for WFIRST. I know @sosey has already done this surgery once as a proof-of-concept.

pllim commented 5 years ago

I only care about reading the data and when the reading part is not OS agnostic, it is not good advertising. My request stands.

sosey commented 5 years ago

yes, reading the data may require the jwst library. esp if you want the objects instantiated that are part of the file as written. You need the serialization functions that know how to create the objects, those are in the jwst library. You can read an asdf file that has custom types using the generic asdf open, but that wont give you the custom object that's supposed to be created. I don't want custom serialization functions in a generic datamodels package.

perrygreenfield commented 5 years ago

Point 1: Isn't this really a request for ASDF instead of JWST?

Point 2: But to address the issue raised initially, I think it is good to allow ASDF to read ASDF files even if the software that some tags require is not there. So in that sense, it should be possible to do that (Dan, can't that be done now?). Yes, that means that you don't get the custom objects that the tags provide. But all the information is accessible still.

Point 3: While STScI may not support Windows for JWST pipelines, it is essential that we support Windows for ASDF since we are advertising it as a generic format that is cross platform.

Point 4: Separating data models into a separate package may or may not be possible (it is very easy to pull in dependencies), but is that the core issue? The point that Harry raised was that it would be nice to get the GWCS for the data without having JWST installed (some may argue if you have JWST data, you probably want JWST software and it is a reasonable argument). But the current catch is that the GWCS models for JWST are very specialized and don't make sense to put directly into GWCS. Unless the JWST GWCS models use more generic models, it is hard to avoid the JWST software until that happens.

sosey commented 5 years ago

P2: yah, that should already be possible. That's what I meant with you can read the asdf files as they stand now, you just dont know what to do necessarily with the tree members that go into custom objects.

P1 & P3: my starting assumption was that pl's request is to use this to read jwst data that has GWCS objects into ginga, and she wants to use the GWCS for locations.

P4: yup. Though, I would think the GWCS for most missions would be highly specialized. My thought had been, can we have a generic datamodels package with the specs for commonly useful things, like ImageModel, SpecModel, etc, and core schema items, that other libraries can use for building on? JWST pulls in datamodels, adds custom schemas and models; WFIRST pulls in datamodels, adds custom schemas and models....

perrygreenfield commented 5 years ago

@sosey Regarding P4, we should talk. I'm trying to think of generic data models for ASDF that are less JWST-focussed right now. There are many similarities but also some differences. These are more focussed on calibrated data than raw data. I'm not sure it is as easy to find generically useful data models for raw data that are good in pipelines. Mainly because each instrument has its own quirks and special information that other telescopes/instruments may not care about. Another is that the data models for JWST have all the FITS baggage of mapping metadata to FITS keywords. Most of that cannot remain, or at least has to be customized elsewhere. I also wonder if GWCS has to be highly specialized for JWST, but that is a different topic.

sosey commented 5 years ago

@perrygreenfield yup, we need to catchup. I think generically useful raw data models are possible, serving as a base template, and the quirks get added as necessary. I'm definitely NOT thinking of the FITS baggage, I'd like to leave that out of the generic datamodels and use pure asdf, FITSy stuff would be a jwst customization. So yah, we should talk 😄

drdavella commented 5 years ago

Yes reading any ASDF file should be possible regardless of whether the schemas and tags that were used to create the original file are available/installed/used.

However, this doesn't mean that a given ASDF file is particularly useful for many use cases without the associated types/models/etc., at least not without a lot of extra work. I think this is the point that @pllim is making with respect to reading JWST data products with ginga, and I think it's a perfectly reasonable request.

My personal take on this is that there's no reason in principle that all of the models and transforms that are necessary to interpret a JWST data product can't be packaged separately from the pipeline itself. It seems like separating these components could significantly reduce the amount of software that needs to be installed just to simply read a JWST data product and make use of the WCS.

However, I understand that a lot of work would be required to make this happen, so it's not necessarily something that can be provided in the immediate term.

drdavella commented 5 years ago

One specific problem that needs to be investigated a bit further is that it seems like it's possible to read a JWST data product with only asdf and gwcs installed, and it's possible to inspect many parts of that file and get useful information. The GWCS pipeline appears to be reconstructed properly. However, some of the GWCS pipeline steps might themselves be models that are defined by JWST, and so when a user tries to access one of these steps, an error occurs.

It's not clear to me whether this is a problem with asdf, or gwcs, or even astropy, or maybe it's just to be expected given the circumstances.

sosey commented 5 years ago

I would think it's to be expected because many of the JWST gwcs objects have custom transforms that are not generic enough to be added to gwcs, astropy, or asdf

perrygreenfield commented 5 years ago

To elaborate a tiny bit, some of these transforms currently reside in the JWST package. They probably can be separated out but they would be in a funny little package as a result. They are too specialized to be in GWCS at the moment.

nden commented 5 years ago

The JWST transforms are currently in the jwst.transforms package. With some (possibly not trivial) amount of work they can be moved to modeling.

hbushouse commented 5 years ago

Jira tracking in https://jira.stsci.edu/browse/JP-851

nden commented 3 years ago

Removed the 7.7 milestone. As it currently stands the jwst pipeline is still needed to open files.

braingram commented 2 months ago

Closing as https://github.com/spacetelescope/stdatamodels/ addresses this issue. It provides a library (that doesn't depend on jwst) that can read and write the data models used in jwst.