NMMA API Dockerfile (#3)

Theodlz commented 1 year ago

This PR adds a first version of a basic NMMA API based service, which would be built and pushed to DockerHub (and potentially connected to a CI later on for deployment) as part of NMMA's Github Actions.

This built on top of https://github.com/Theodlz/nmma-standalone-api-service, which was coded few months back. Some thing need to be fixed because we have a working version:

The Me2017 model throws an error ('Dynesty' object has no attribute '_log_likelihood_eval_time')
When running locally and calling it from a local SkyPortal, it can't push back the results as the container hasn't access to the local SkyPortal's network (port not exposed and bonded).
Even with the port exposed, it looks like I need to bind the port (specified in the command to start a container from image) for it to be accessible.

Otherwise, building the image and pushing it to Dockerhub works. This GitHub action will build images for both amd64 and arm64/v8 (mac silicon) using QEMU. By the way, this is the reason why it takes longer to build then locally, as github needs to emulate the arm64 machine.

Theodlz commented 1 year ago

@mcoughlin I believe this is what you had in mind right?

Theodlz commented 1 year ago

Ah and also right now it triggers on push, just to make it easier to develop. Once we're happy with it, we might wanna change to only on main

mcoughlin commented 1 year ago

@Theodlz exactly right. Maybe @tylerbarna can take a look?

bfhealy commented 1 year ago

Hi @Theodlz, this looks very useful! Is there a way I can test this locally despite the limitations of running on a local SkyPortal instance?

Theodlz commented 1 year ago

I suggest running the API as is, not in docker. You can use the conda requirements file to create an env with conda, or I can send you a classic requirements.txt if you prefer virtualenv.

bfhealy commented 1 year ago

Thanks, I'll try it with conda.

bfhealy commented 1 year ago

@Theodlz, I have the API service running locally, and I'm able to make GET requests to health and analysis/nmma_analysis and obtain the expected response. I'm not quite familiar enough with SkyPortal analysis services to test the POST request for analysis/nmma_analysis using my local SkyPortal. Does the demo data include any sources for which this analysis can be run?

Theodlz commented 1 year ago

Hi Brian,

Let me send you what you can add to the db_demo.yaml to have NMMA added to your skyportal

Theodlz commented 1 year ago

@bfhealy you can put this:

- name: "NMMA_Analysis"
    display_name: "NMMA analysis"
    description: "Use NMMA to fit fast transient light curves"
    version: "1.0"
    contact_name: "Michael Coughlin"
    url: "http://localhost:6901/analysis/nmma_analysis"
    authentication_type: "header_token"
    _authinfo: '{"header_token": {"Authorization": "Bearer MY_TOKEN"}}'
    analysis_type: "lightcurve_fitting"
    input_data_types: ["photometry", "redshift"]
    optional_analysis_parameters: '{"source": ["Me2017", "Piro2021", "nugent-hyper", "TrPi2018"], "fix_z": ["True", "False"]}'
    group_ids:
      - =program_A
      - =program_B

under the analysis_services: key in db_demo.yaml, and run make load_demo_data with your skyportal already up and running.

bfhealy commented 1 year ago

Thanks, I'll give this a try!

mcoughlin commented 1 year ago

@Theodlz @bfhealy instead of Dynesty, can we use pymultinest? I wonder if we need to pin a certain version of Dynesty, do you know @tsunhopang?

tsunhopang commented 1 year ago

@mcoughlin yes the version of dynesty has to align with the version of bilby used. For the current nmma, we should be using dynesty>=2.0.0

mcoughlin commented 1 year ago

@bfhealy @Theodlz maybe spend a minute debugging, but I think the default should be pymultinest.

bfhealy commented 1 year ago

I'm getting a different error when I change the sampler to pymultinest:

[11:14:12 nmma] Traceback (most recent call last):
  File "/Users/bhealy/nmma/api/app.py", line 197, in run_nmma_model
    main(args=args)
  File "/Users/bhealy/nmma/nmma/em/analysis.py", line 608, in main
    result = bilby.run_sampler(
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/__init__.py", line 234, in run_sampler
    result = sampler.run_sampler()
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/base_sampler.py", line 96, in wrapped
    output = method(self, *args, **kwargs)
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/pymultinest.py", line 156, in run_sampler
    out = pymultinest.solve(
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/pymultinest/solve.py", line 71, in solve
    run(**kwargs)
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/pymultinest/run.py", line 237, in run
    prev_handler = signal.signal(signal.SIGINT, interrupt_handler)
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/signal.py", line 56, in signal
    handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
ValueError: signal only works in main thread of the main interpreter

mcoughlin commented 1 year ago

@bfhealy @Theodlz ah I thought they had fixed that. It might be worth raising an issue on bilby to check. Can you reproduce the Dynesty error just running nmma as usual?

bfhealy commented 1 year ago

@mcoughlin @Theodlz Yes, I'm able to reproduce the same Dynesty error with a generic light_curve_analysis call.

mcoughlin commented 1 year ago

@bfhealy sounds like it would be good to open an issue then. This is with the latest Dynesty version? Any requirements we aren't meeting of theirs?

bfhealy commented 1 year ago

@mcoughlin Yep, latest Dynesty version and all its requirements met. I'm wondering if this might be a bilby issue given the full error output below:

Traceback (most recent call last):
  File "/Users/bhealy/miniforge3/envs/nmma_api2/bin/light_curve_analysis", line 33, in <module>
    sys.exit(load_entry_point('nmma==0.0.8', 'console_scripts', 'light_curve_analysis')())
  File "/Users/bhealy/nmma/nmma/em/analysis.py", line 608, in main
    result = bilby.run_sampler(
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/__init__.py", line 190, in run_sampler
    sampler = sampler_class(
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/dynesty.py", line 234, in __init__
    int(check_point_delta_t / self._log_likelihood_eval_time / 10), 10
AttributeError: 'Dynesty' object has no attribute '_log_likelihood_eval_time'

mcoughlin commented 1 year ago

@bfhealy Can you open this on bilby then? Also check in with them on pymultinest?

bfhealy commented 1 year ago

@mcoughlin Will do!

bfhealy commented 1 year ago

@mcoughlin @Theodlz After checking in with bilby and merging the latest nmma changes, I've gotten past the errors above. This did require that I install pymultinest from the source rather than via pip. I now receive the following errors when running NMMA Analysis using my local skyportal:

If the sampler is pymultinest:

Traceback (most recent call last):
  File "/Users/bhealy/nmma/api/app.py", line 197, in run_nmma_model
    main(args=args)
  File "/Users/bhealy/nmma/nmma/em/analysis.py", line 673, in main
    result = bilby.run_sampler(
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/__init__.py", line 234, in run_sampler
    result = sampler.run_sampler()
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/base_sampler.py", line 96, in wrapped
    output = method(self, *args, **kwargs)
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/pymultinest.py", line 178, in run_sampler
    self.result.nested_samples = self._nested_samples
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/pymultinest.py", line 201, in _nested_samples
    np.vstack([dead_points, live_points]).copy(),
  File "<__array_function__ internals>", line 180, in vstack
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/numpy/core/shape_base.py", line 282, in vstack
    return _nx.concatenate(arrs, 0)
  File "<__array_function__ internals>", line 180, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 0 and the array at index 1 has size 10

For dynesty:

Traceback (most recent call last):
  File "/Users/bhealy/nmma/api/app.py", line 197, in run_nmma_model
    main(args=args)
  File "/Users/bhealy/nmma/nmma/em/analysis.py", line 673, in main
    result = bilby.run_sampler(
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/__init__.py", line 234, in run_sampler
    result = sampler.run_sampler()
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/base_sampler.py", line 96, in wrapped
    output = method(self, *args, **kwargs)
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/dynesty.py", line 517, in run_sampler
    out = self._run_external_sampler_with_checkpointing()
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/dynesty.py", line 652, in _run_external_sampler_with_checkpointing
    self.sampler.run_nested(**sampler_kwargs)
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/dynesty/sampler.py", line 1044, in run_nested
    for i, results in enumerate(self.add_live_points()):
  File "/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/dynesty/sampler.py", line 455, in add_live_points
    raise ValueError("The remaining live points have already "
ValueError: The remaining live points have already been added to the list of samples!

Neither of these errors are thrown if I run nmma without the API service.

mcoughlin commented 1 year ago

@bfhealy and the inputs are otherwise identical? It's just running main from the api rather from the executable?

Theodlz commented 1 year ago

@bfhealy this API service was first designed a while back (6 months ago maybe?), but still worked in January. It is possible that some changes made to nmma made this broken. Ill have another look this afternoon.

bfhealy commented 1 year ago

Thanks @Theodlz - @mcoughlin, for the executable I'm currently using an injection, while the API is running on a light curve from the skyportal demo data. I found additional log output from the API call that may be useful:

Starting MultiNest
 generating live points
 live points generated, starting sampling
Acceptance Rate:                        1.000000
Replacements:                                 32
Total Samples:                                32
Nested Sampling ln(Z):            **************
16:10 bilby INFO    : Overwriting /var/folders/8_/ky643qs168ngjmhrpwcq1fdm0000gn/T/tmpr36n5iik/pm_ZTF21aaqjmps_nugent-hyper/ with /var/folders/8_/ky643qs168ngjmhrpwcq1fdm0000gn/T/tmpf220hkcy/
 ln(ev)=   2.2898349882893854E-016 +/-                       NaN
 Total Likelihood Evaluations:           32
 Sampling finished. Exiting MultiNest
  analysing data from /var/folders/8_/ky643qs168ngjmhrpwcq1fdm0000gn/T/tmpf220hkcy/.txt
16:10 bilby INFO    : Overwriting /var/folders/8_/ky643qs168ngjmhrpwcq1fdm0000gn/T/tmpr36n5iik/pm_ZTF21aaqjmps_nugent-hyper/ with /var/folders/8_/ky643qs168ngjmhrpwcq1fdm0000gn/T/tmpf220hkcy/
/Users/bhealy/miniforge3/envs/nmma_api2/lib/python3.9/site-packages/bilby/core/sampler/pymultinest.py:193: UserWarning: genfromtxt: Empty input file: "/var/folders/8_/ky643qs168ngjmhrpwcq1fdm0000gn/T/tmpr36n5iik/pm_ZTF21aaqjmps_nugent-hyper//ev.dat"
  dead_points = np.genfromtxt(dir_ + "/ev.dat")
2023-06-20 16:10:00 nmma: Exception while running the model: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 0 and the array at index 1 has size 7

mcoughlin commented 1 year ago

@bfhealy Looks to me like it never really sampled anything. Could we try upping the live points and lowering the evidence criteria? I suspect we made the sampling parameters too aggressive to actually function properly in that script.

bfhealy commented 1 year ago

@mcoughlin You're right - when I tried a different demo source with more light curve points, the sampling ran and the process completed successfully. With the original source I increased the live points and adjusted the evidence tolerance but still get the same error. Perhaps it's because that light curve has upper limits mixed in with the detections?

mcoughlin commented 1 year ago

@bfhealy could be... but the code should still sample all the same. Feels like a big in NMMA if we struggle that much with limits.

tsunhopang commented 1 year ago

@bfhealy sorry for jumping in but what kind of event/injection are being analyzed here? And what live points number and evidence tolerance have you tested?

As the output u just show is showing the posterior is the basically the prior (the ln(ev) ~ 0), that could also be the injected light curves is fully below the detection limit.

tsunhopang commented 1 year ago

More accuratly, all the log-likelihood values are zero.

mcoughlin commented 1 year ago

Although it is the case that we should have some kind of catch rather than throwing the existing exception.

bfhealy commented 1 year ago

@mcoughlin @tsunhopang I was able to get the analysis to run by deleting the upper limit points from the photometry. The source is ZTF21aaqjmps in the skyportal demo data (photometry below) - it's an SN II, but I thought it was the most relevant object in the demo data to test with the API.

mcoughlin commented 1 year ago

@bfhealy @tsunhopang can you debug what goes wrong in the presence of limits? We don't want a situation where we struggle with those...

tsunhopang commented 1 year ago

@bfhealy could you share the both commandsyou used that run and the one failed?

bfhealy commented 1 year ago

@tsunhopang Both commands were run via a call to nmma.em.analysis.main in this PR's app.py code. This call was initiated using SkyPortal by setting up an NMMA analysis service following Theo's directions above: https://github.com/nuclear-multimessenger-astronomy/nmma/pull/99#issuecomment-1591495322

In app.py, I did modify some parameters such that nlive = 512, interpolation_type = 'tensorflow' and sampler = 'pymultinest'.

tsunhopang commented 1 year ago

@bfhealy and which light curve model run is having trouble? (I assume that all the model within the list source are being used one by one). Moreover, I see that trigger time is set to the minimum value of the data timestamps with a tmin of 0.01 and tmax of 7, given the light curve you just show, I think that would leave us with only a handful of data points?

bfhealy commented 1 year ago

@tsunhopang I was getting the same error for each model in the list. You're right about the 7-day baseline limiting the number of points. I did some more experimenting with the original photometry (plotted below) and found that:

The app defines t0 using the first photometric point (even if it's a limit), rather than the first detection
Given the above and the default 7-day baseline, the photometry passed to the app contained no detections (only limits)
When I expand the baseline to include detections, the sampling runs, even if the photometry contains limits

So it looks like there is not a larger issue with photometric limits, but t0 might be better defined as the time of the first detection.

mcoughlin commented 1 year ago

@bfhealy Yeah making that change sounds reasonable. And we should make the API configurable to change the start and end times.

bfhealy commented 1 year ago

@mcoughlin @Theodlz I pushed a commit that sets nlive = 512, interpolation_type=tensorflow, and sampler=pymultinest. The commit also defines t0 corresponding to the first detection and allows tmin, tmax, and dt to changed by the user. This will require the following update to the skyportal NMMA_Analysis config:

optional_analysis_parameters: '{"source": ["Me2017", "Piro2021", "nugent-hyper", "TrPi2018", "Bu2022Ye"], "fix_z": ["True", "False"], "tmin": {"type": "number", "default": 0.01}, "tmax": {"type": "number", "default": 7}, "dt": {"type": "number", "default": 0.1}}'

Theodlz commented 1 year ago

Closing, replaced by #145

nuclear-multimessenger-astronomy / nmma

NMMA API Dockerfile (#3) #99