spacetelescope / dat_pyinthesky

Notebooks for "notebook-driven development" for the Data Analysis Tools efforts
https://dat-pyinthesky.readthedocs.io/en/latest/
8 stars 44 forks source link

New nb for JWST/MIRI/MRS point source spectral extraction from detector #162

Closed YannisArgyriou closed 2 years ago

YannisArgyriou commented 2 years ago

New notebook delivered as part of analysis of M star spectrum with the MRS. The notebook extracts a point source spectrum from the 2D detector image plane instead of the 3D IFU spectral cubes. This allows to introduce several percent-level corrections, as well as extract the flux in an optimal way using the variance and PSF fraction of each individual pixel.

Including here @drlaw1558 , @orifox @astroolivine

REQUEST FOR BOX ACCESS TO UPLOAD RELATED FILES, PLEASE USE EMAIL: argyriou.yannis@gmail.com

review-notebook-app[bot] commented 2 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

ojustino commented 2 years ago

Hi @orifox,

Thank you for submitting these changes to @YannisArgyriou's notebook. Please read on for the technical review.

Before you begin

The technical review helps ensure that contributed notebooks a) run from top to bottom, b) follow the PEP8 standards for Python code readability, and c) conform to the Institute's style guide for Jupyter Notebooks.

I've pushed the review as a new commit in this pull request. To view and edit the commit locally, follow these steps:

git checkout THIS-BRANCH
git fetch YANNIS-REMOTE-FORK main
git merge YANNIS-REMOTE-FORK/THIS-BRANCH

_(THIS-BRANCH is the name you gave this branch on your local machine. YANNIS-REMOTE is the name you gave Yannis' remote on your local machine. If you don't know your name for the remote, run git remote -v and choose the one matched with the path ending in YannisArgyriou/dat_pyinthesky.git.)_

From here you can work on your branch as normal. If you have trouble with this step, please let me know before continuing.


Instructions

After updating your local copy of this branch, please open your notebook and address any warnings or errors you find.

If you see cells with output like this, it means some of your code doesn't follow the PEP8 standards of code readability:

image

(In the example above, INFO - 3:3: E111 means that the text entered on line 3 at index 3 caused the warning "E111". The violation is briefly described at the end of the message.)

You can test that your edits satisfy the standard by installing flake8 on the command line with:

pip install flake8==3.7.3 pycodestyle_magic

Then, restart the notebook and run the following cells under the imports:

image

After that, edit and re-run cells with warnings until you've fixed all of them. Please remember to delete the cells shown in the above image before pushing your changes back to this pull request.

If you have questions or feedback on specific cells, click the earlier message in this thread from the "review-notebook-app" bot. There, you can comment on specific cells and view what's changed in the new commit. I may also write comments there. Anything posted there will also be reflected in this pull request's conversational thread.

The three-point review (*action required*)

  1. ⚠️ Notebook execution: I was not able to run the notebook from top to bottom.

    • I was only able to run the pipeline while connected to the STScI VPN, which isn't tenable for most. I will leave more detailed comments on the error in the cell where it took place.
    • Even after connecting to the VPN, I was unable to run the pipeline until I updated the requirements files. Fixing photutils at 1.1.0 and (in pre-requirements.txt) numpy at 1.18.5 held other packages back from using their most recent versions. I condensed requirements.txt to a set of packages whose own requirements should cover the previous list without harming the notebook (I believe).
    • The source of all the errors you should see in the latter portion of the notebook is the unsuccessful import from PointSourceDetectorBasedExtractionFuncs. This happens because the Box folder's .py files aren't downloading correctly. See the cell-specific comment for more information, but once it's fixed, I believe the other NameErrors should resolve themselves.
  2. ⚠️ Code style: There are a good number of PEP8 violations.

    • Please follow the advice in the "Instructions" section above to fix them. I placed the reviewer cells after the imports in order to block jwst's usual wave of printouts.
  3. ✅/⚠️ Notebook style

    • Please remember to clear all cell outputs before committing changes.
ojustino commented 2 years ago

Hi @orifox, did you try running your edited version of this notebook from scratch? When I do, some cells (like those that run the pipeline) reference paths that don't exist after I download the data files.

I suspect you didn't clear out files from your previous runs. If this is true, try deleting all files in jdat_notebooks/MRS_Mstar_analysis except the .ipynb files, files ending in requirements.txt, and the two .py files you added. Run the notebook and see if it still works for you.

I also un-resolved a conversation thread about a missing import.

ojustino commented 2 years ago

@ofox:

The three-point review (*action required*)

  1. ⚠️ Notebook execution: I was not able to run the notebook from top to bottom without adjustments.

    • The cell you added that sets environment variables a) does not get the pipeline working for me off the VPN, and b) introduces a new error with the pipeline while on the VPN that does not happen when I skip the cell. The only working configuration for me is on the VPN and without that cell, so there is more troubleshooting to do if the off-VPN case is important. I'll post the error below the review.
  2. ✅ Code style: All PEP8 errors were addressed.

  3. ✅/⚠️ Notebook style

    • In the future, please remember to clear all cell outputs before committing changes. Saving output from large notebooks like this one inflates the repository's size unless extra steps are taken pre-merge.

It looks like the CRDS_SERVER_URL is being converted into 'https://crds-serverless-mode.stsci.edu/' somewhere along the line in the following traceback, causing an error:

(expand to view error) ``` --------------------------------------------------------------------------- ServiceError Traceback (most recent call last) File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/client/api.py:387, in _get_server_info() 386 config_uri = f"JSON RPC service at '{get_crds_server()}'" --> 387 info = S.get_server_info() 388 log.verbose("Connected to server at", srepr(get_crds_server())) File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/client/proxy.py:156, in ServiceCallBinding.__call__(self, *args, **kwargs) 155 def __call__(self, *args, **kwargs): --> 156 jsonrpc = self._call(*args, **kwargs) 157 if jsonrpc["error"]: File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/client/proxy.py:121, in ServiceCallBinding._call(self, *args, **kwargs) 120 if "serverless" in url or "server-less" in url: --> 121 raise exceptions.ServiceError("Configured for server-less mode. Skipping JSON RPC " + repr(self.__service_name)) 123 if log.get_verbose() <= 50: ServiceError: Configured for server-less mode. Skipping JSON RPC 'get_server_info' The above exception was the direct cause of the following exception: CrdsNetworkError Traceback (most recent call last) Input In [18], in () 17 pipe1short.refpix.skip = True 18 pipe1short.output_file = baseshort + datestringshort ---> 20 pipe1short.run(shortfile) 22 pipe1long = Detector1Pipeline() 24 for longfile in alllongfiles: File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/stpipe/step.py:428, in Step.run(self, *args) 426 else: 427 if self.prefetch_references: --> 428 self.prefetch(*args) 429 try: 430 step_result = self.process(*args) File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/stpipe/step.py:545, in Step.prefetch(self, *args) 543 # prefetch truly occurs at the Pipeline (or subclass) level. 544 if len(args) and len(self.reference_file_types) and not self.skip: --> 545 self._precache_references(args[0]) File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/stpipe/pipeline.py:243, in Pipeline._precache_references(self, input_file) 240 try: 241 with self.open_model(input_file, asn_n_members=1, 242 asn_exptypes=["science"]) as model: --> 243 self._precache_references_opened(model) 244 except (ValueError, TypeError, IOError): 245 self.log.info( 246 'First argument {0} does not appear to be a ' 247 'model'.format(input_file)) File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/stpipe/pipeline.py:266, in Pipeline._precache_references_opened(self, model_or_container) 263 self._precache_references_opened(contained_model) 264 else: 265 # precache a single model object --> 266 self._precache_references_impl(model_or_container) File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/stpipe/pipeline.py:290, in Pipeline._precache_references_impl(self, model) 286 fetch_types = sorted(set(self.reference_file_types) - set(ovr_refs.keys())) 288 self.log.info("Prefetching reference files for dataset: " + repr(model.meta.filename) + 289 " reftypes = " + repr(fetch_types)) --> 290 crds_refs = crds_client.get_multiple_reference_paths(model.get_crds_parameters(), fetch_types, model.crds_observatory) 292 ref_path_map = dict(list(crds_refs.items()) + list(ovr_refs.items())) 294 for (reftype, refpath) in sorted(ref_path_map.items()): File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/stpipe/crds_client.py:55, in get_multiple_reference_paths(parameters, reference_file_types, observatory) 52 raise TypeError("First argument must be a dict of parameters") 54 log.set_log_time(True) ---> 55 refpaths = _get_refpaths(parameters, tuple(reference_file_types), observatory) 56 return refpaths File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/stpipe/crds_client.py:68, in _get_refpaths(data_dict, reference_file_types, observatory) 66 return {} 67 with crds_cache_locking.get_cache_lock(): ---> 68 bestrefs = crds.getreferences( 69 data_dict, reftypes=reference_file_types, observatory=observatory) 70 refpaths = {filetype: filepath if "N/A" not in filepath.upper() else "N/A" 71 for (filetype, filepath) in bestrefs.items()} 72 return refpaths File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/core/heavy_client.py:127, in getreferences(parameters, reftypes, context, ignore_cache, observatory, fast) 122 final_context, bestrefs = _initial_recommendations("getreferences", 123 parameters, reftypes, context, ignore_cache, observatory, fast) 125 # Attempt to cache the recommended references, which unlike dump_mappings 126 # should work without network access if files are already cached. --> 127 best_refs_paths = api.cache_references( 128 final_context, bestrefs, ignore_cache=ignore_cache) 130 return best_refs_paths File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/client/api.py:853, in cache_references(pipeline_context, bestrefs, ignore_cache) 851 localrefs = {name: get_flex_uri(name) for name in wanted} 852 else: --> 853 localrefs = FileCacher(pipeline_context, ignore_cache, raise_exceptions=False).get_local_files(wanted)[0] 855 refs = _squash_unicode_in_bestrefs(bestrefs, localrefs) 857 return refs File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/client/api.py:594, in FileCacher.get_local_files(self, names) 592 localpaths[name] = localpath 593 if downloads: --> 594 n_bytes = self.download_files(downloads, localpaths) 595 else: 596 log.verbose("Skipping download for cached files", sorted(names), verbosity=60) File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/client/api.py:620, in FileCacher.download_files(self, downloads, localpaths) 618 def download_files(self, downloads, localpaths): 619 """Serial file-by-file download.""" --> 620 download_metadata = get_download_metadata() 621 self.info_map = {} 622 for filename in downloads: File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/core/utils.py:305, in CachedFunction.__call__(self, *args, **keys) 301 def __call__(self, *args, **keys): 302 """Compute or fetch func(*args, **keys). Add the result to the cache. 303 return func(*args, **keys) 304 """ --> 305 key, result = self._readonly(*args, **keys) 306 self.cache[key] = result 307 return result File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/core/utils.py:292, in CachedFunction._readonly(self, *args, **keys) 290 else: 291 log.verbose("Uncached call", self.uncached.__name__, repr(key), verbosity=80) --> 292 return key, self.uncached(*args, **keys) File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/client/api.py:367, in get_download_metadata() 364 @utils.cached 365 def get_download_metadata(): 366 "Defer and cache decoding of download_metadata field of server info.""" --> 367 info = get_server_info() 368 return proxy.crds_decode(info["download_metadata"]) File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/core/utils.py:305, in CachedFunction.__call__(self, *args, **keys) 301 def __call__(self, *args, **keys): 302 """Compute or fetch func(*args, **keys). Add the result to the cache. 303 return func(*args, **keys) 304 """ --> 305 key, result = self._readonly(*args, **keys) 306 self.cache[key] = result 307 return result File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/core/utils.py:292, in CachedFunction._readonly(self, *args, **keys) 290 else: 291 log.verbose("Uncached call", self.uncached.__name__, repr(key), verbosity=80) --> 292 return key, self.uncached(*args, **keys) File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/client/api.py:337, in get_server_info() 322 @utils.cached 323 def get_server_info(): 324 """Return a dictionary of critical parameters about the server such as: 325 326 operational_context - the context in use in the operational pipeline (...) 335 what context, software, and network mode should be used for processing. 336 """ --> 337 info = _get_server_info() 338 info["server"] = get_crds_server() 339 # The original CRDS info struct features both "checked" and "unchecked" 340 # versions of the download URLs where the unchecked version is a simple 341 # static file which has been used exclusively for performance reasons. (...) 345 # clients... while older clients will continue to work with the simplified 346 # server. File ~/opt/anaconda3/envs/datpy-miri-point5/lib/python3.8/site-packages/crds/client/api.py:392, in _get_server_info() 390 info["connected"] = True 391 except Exception as exc: --> 392 raise CrdsNetworkError( 393 f"Failed downloading cache config from: {config_uri}:", 394 srepr(exc)) from exc 395 return info CrdsNetworkError: Failed downloading cache config from: JSON RPC service at 'https://crds-serverless-mode.stsci.edu/': "Configured for server-less mode. Skipping JSON RPC 'get_server_info'" ```
ojustino commented 2 years ago

I was able to run the notebook to completion by setting environment variables related to CRDS before imporrting the crds package (h/t to the JWST Help Desk). The package sets CRDS_PATH, CRDS_SERVER_URL, and other variables to predetermined default values if they don't already exist before the import.

In my case, trying to reassign any of these environment variables post-import did not work, but this was not the case for @orifox and some others. I don't (yet?) know the source of the inconsistency.

At any rate, this fix resolves the last action item from my technical review.

ojustino commented 2 years ago

@orifox gave clearance to merge the notebook. It's unfortunate to lose the commit history, but I will squash the commits beforehand to reduce bloat in the repository.