pmelchior / skymapper

Mapping astronomical survey data on the sky, handsomely
MIT License
43 stars 15 forks source link

focus() before healpix() blows up memory #32

Open monodera opened 1 week ago

monodera commented 1 week ago

The density map generated with skymapper does not seem to follow the resolution set by the nside parameter with healpy.

I run the following code with nside=32 (resolution of 110 arcmin) and nside=1024 (resolution of 3.4 arcmin), the resolution (or pixel size) of the output density map.

import healpy as hp
import numpy as np
import skymapper as skm

np.random.seed(42)

def getCatalog(size=10000, survey=None):
    # dummy catalog: uniform on sphere
    # Marsaglia (1972)
    xyz = np.random.normal(size=(size, 3))
    r = np.sqrt((xyz**2).sum(axis=1))
    dec = np.arccos(xyz[:, 2] / r) / skm.DEG2RAD - 90
    ra = 180 - np.arctan2(xyz[:, 0], xyz[:, 1]) / skm.DEG2RAD

    # survey selection
    if survey is not None:
        inside = survey.contains(ra, dec)
        ra = ra[inside]
        dec = dec[inside]
    return ra, dec

class TestSurvey(skm.survey.Survey):
    def contains(self, ra, dec):
        return ((dec > 65) & (dec < 68)) & ((ra > 265) & (ra < 275))

if __name__ == "__main__":

    # load RA/Dec from catalog
    size = 300000
    survey = TestSurvey()
    ra, dec = getCatalog(size, survey=survey)

    # 1) construct a projection, here Albers
    # define the optimal projection for set of coordinates
    # by minimizing the variation in distortion
    crit = skm.stdDistortion
    proj = skm.Albers.optimize(ra, dec, crit=crit)

    # 2) construct map: will hold figure and projection
    # the outline of the sphere can be styled with kwargs for matplotlib Polygon
    map = skm.Map(proj)

    # 3) add graticules, separated by 15 deg
    # the lines can be styled with kwargs for matplotlib Line2D
    # additional arguments for formatting the graticule labels
    sep = 1
    map.grid(sep=sep)

    # 4) add data to the map, e.g.
    # make density plot
    nside = 32
    # nside = 1024

    print(f"nside = {nside}")
    # print(f"{hp.nside2npix(nside)} pixels")
    print(f"{hp.nside2resol(nside, arcmin=True)=} arcmin")
    # print(f"{hp.nside2resol(nside, arcmin=True)/60} deg")

    mappable = map.density(ra, dec, nside=nside, vmin=0, vmax=0.005)
    cb = map.colorbar(mappable, cb_label="$n_g$ [arcmin$^{-2}$]")

    # add scatter plot
    map.scatter(ra, dec, s=10**2, edgecolor="k", facecolor="None")

    # focus on relevant region
    map.focus(ra, dec)

    map.title(f"{nside=}")

    map.savefig(f"readme_example_{nside}.png", bbox_inches="tight", dpi=300)

readme_example_32 readme_example_1024

I'd expect that the pixel size is that defined by nside in Healpix (hp.nside2resol()), but it seems not.

My python environment is the following:

pmelchior commented 1 week ago

Thanks for reporting. I think nside is treated correctly, but we don't get the see the expected result. Here's why I think this happens:

  1. map.density only plots the non-empty cells: https://github.com/pmelchior/skymapper/blob/9c124f9f00a9251a93a4cee597c55be90dd8476c/skymapper/map.py#L1290 The reason is to avoid plotting densities in the area a survey has not covered, but there will be many more empty cells at high nside. The average galaxy density in your example is pretty small still.
  2. As of v0.4 (or thereabouts) map.density and map.healpix don't plot healpix polygons as matplotlib artists, instead these functions now render the polygons into an pixelized image of the map. Several reasons for the change, suffice it to say that healpy uses the same approach. This is why, after zooming in, the shape of the cells appear rectangular and not like healpix cells.

I suspect the typical cell at nside=1024 is smaller than 1 pixel at the resolution of the screen (or matplotlib backend). If many cells are empty and thus not plotted, it's therefore likely that most of the map appears empty. I confirmed this with nside=128, which starts to get half-empty: Figure_1 This is not a bug, it's just too low a density for nside=1024 to be meaningful. The only way to make this behavior more clear is by not using a mask, in which case all cells with zero density will be shown: Figure_2

I'm not a fan of it, but it makes the problem more transparent, and some people may prefer this option. So, I'm pushing a fix that will allow this with the option map.density(..., mask_empty=False).

monodera commented 1 week ago

Thanks for the prompt reply. I think that my example was not good to highlight the issue. Here is an example file with about 140k objects in a relatively small sky area (several sq. degrees), so empty cells are quite rare within a survey area.

randomized_targets.csv

This is a map for nside=128: random_sample_128

The following is that for nside=1024: random_sample_1024 with data points: random_sample_1024_scatter

With this dataset, I still think that the pixels are not correctly shown. I'm not sure if the planned change in the 0.4+ version will be the fix for this.

For comparison, the following map is generated by TOPCAT for nside=1024 and the pixel size is about 3 arcmin as expected. random_example_topcatmap

pmelchior commented 1 week ago

Yeah, but you see the effect of item 2: the map is pixelized. So, what you see are not the healpix cells but the resolution limit of your backend before you zoom in. Can you run map.focus(ra, dec) before calling map.density. This should use the full resolution of the backend for the narrower area instead of the entire sky.

monodera commented 1 week ago

Thanks for clarification. I understood what you mean.

When I called map.focus(ra, dec) before creating the density map, the memory consumption became insanely high (>30GB) even with nside=32 and for some reason, matplotlib appeared to get stuck at map.desity().

pmelchior commented 1 week ago

OK, that needs to get fixed! Let me see what's going on.