possible bug in Ray objects: calculation of projected lengths dl for non-grid/SPH data

nastasha-w commented 10 months ago

Bug report

Bug summary

There is a possible bug in the calculation of lengths dl in Ray objects for SPH and other non-grid data. I think the Ray object is inputting (impact parameter / smoothing length) into a cython function where it should be inputting (impact parameter / smoothing length)^2. This would mean lengths dl, and derived quantities like surface densities or column densities are systematically underestimated.

Bug reasoning

As far as I can tell, the under-the-hood code to calculate a resolution element’s projection onto a pencil beam (calculation of dl in a Ray object) comes from the SPLASH paper, which I think is this one: SPLASH paper?. From that paper, the basic equation (eq. 3) to start from is that in an SPH simulation, a field A(x) is sampled by SPH particles such that

A(x)  \approx \sum_{j=1}^{N} (m_j / \rho_j) \cdot A_j \cdot W(|x−x_j|, h_j),

where x is the position (vector), j denotes the particle index, W is the kernel function, and h is the smoothing length. In practice, the kernel only depends on $|x - x_j| / h_j$, and we can write a dimensionless kernel $U(r)$, where $r$ is the normalized distance to the particle center, such that (eq. 8) $W(|x−x_j|, h_j) = U(|x−x_j| / h_j) / h_j^3$ (for 3-dimensional data).

To calculate a column density $N$, we want to integrate a density field $n(x)$ along some line of sight $l$. This means

\displaylines{N = \int dl \, n(x(l)) \\
≈ \int dl \, \left[\sum_{j=1}^{N'} (m_j / \rho_j) \cdot n_j \cdot W\left( |x(l)−x_j|, h_j \right)\right] \\
= \sum_{j=1}^{N'} n_j \cdot (m_j / (\rho_j \cdot h_j^3)) \left( \int dl \, U\left(|x(l)−x_j| / h_j \right) \right)}

where $x(l)$ is the path through the volume. For the straight line paths of the ray object, $x(l) = x_0 + l \hat{x}$, where $\hat{x}$ is a unit vector describing the direction of the path. Examining an integral for one particle, we can set the zero point $x_0$ where the path is closest to $x_j$, at impact parameter $b$, and set $x_j$ as the center of the coordinate system. Then

\displaylines{ \int \, dl \, U\left(|x(l)−x_j| / h_j \right) \\
= \int dl \, U\left(\sqrt{b^2 + l^2} / h_j \right) \\
= h_j \cdot \left[ \int dr \, U \left( \sqrt{(b/h)^2 + r^2)} \right)\right],}

where $r = l / h_j$. We then get

N \approx \sum_{j=1}^{N'} n_j \cdot (m_j / (\rho_j \cdot h_j^2)) \cdot  \left[ \int dr \, U \left( \sqrt{(b_j/h_j)^2 + r^2}\right) \right].

This integral $\int U\left( \sqrt{(b/h)^2 + r^2} \right) dr$ depends only on the normalized impact parameter $b/h$, so in yt, https://github.com/yt-project/yt/blob/main/yt/utilities/lib/pixelization_routines.pyx, line 1024 onward, the class SPHKernelInterpolationTable is created, which calculates a grid of these integrals for a given kernel, and then uses linear interpolation to quickly calculate these values for many SPH particles. This class seems to interpolate between squared, normalized impact parameters q2, e.g. in this trapezoidal integral calculation:

cdef np.float64_t integrate_q2(self, np.float64_t q2) noexcept nogil:
        # See equation 30 of the SPLASH paper
        cdef int i
        # Our bounds are -sqrt(R*R - q2) and sqrt(R*R-q2)
        # And our R is always 1; note that our smoothing kernel functions
        # expect it to run from 0 .. 1, so we multiply the integrand by 2
        cdef int N = 200
        cdef np.float64_t qz
        cdef np.float64_t R = 1
        cdef np.float64_t R0 = -math.sqrt(R*R-q2)
        cdef np.float64_t R1 = math.sqrt(R*R-q2)
        cdef np.float64_t dR = (R1-R0)/N
        # Set to our bounds
        cdef np.float64_t integral = 0.0
        integral += self.kernel(math.sqrt(R0*R0 + q2))
        integral += self.kernel(math.sqrt(R1*R1 + q2))
        # We're going to manually conduct a trapezoidal integration
        for i in range(1, N):
            qz = R0 + i * dR
            integral += 2.0*self.kernel(math.sqrt(qz*qz + q2))
        integral *= (R1-R0)/(2*N)
        return integral

No squares/square roots seem to be taken when storing the q2 values and their integral results in a table, or when the input value of interpolate_array(self, np.float64_t[:] q2_vals) is compared to the table values. The pixelize_sph_kernel_projection function, also in https://github.com/yt-project/yt/blob/main/yt/utilities/lib/pixelization_routines.pyx, does input (impact parameter / smoothing length)^2 into interpolate_array in its projection routine. However, in the yt Ray object (https://github.com/yt-project/yt/blob/main/yt/data_objects/selection_objects/ray.py), the _generate_container_field_sph function for the field dts (dl / total length of the Ray) calls interpolate_array as

itab = SPHKernelInterpolationTable(self.ds.kernel_name)
dl = itab.interpolate_array(b / hsml) * mass / dens / hsml**2

These $dl$ values, as used in e.g., Trident to calculate column densities from particle ion densities, are supposed to be the

dl_j = (m_j / (\rho_j \cdot h_j^2)) \cdot  \left[ \int dr \, U\left(\sqrt{(b_j/h_j)^2 + r^2} \right) \right]

factors that determine each particle's contribution to the total, so that

N = \sum_{j=1}^{N'} n_j \cdot dl_j.

It is therefore a pretty serious problem if these factors are not calculated right.

Code test

To check whether these integrals make sense, I did a simple test making use of the volume integral of the kernel. Going back to the initial definition, and doing a volume integral of the density:

\rho(x) \approx \sum_{j=1}^{N} (m_j / \rho_j) \cdot \rho_j \cdot W\left( |x−x_j|, h_j \right)

\displaylines{ \mathrm{total \, mass} \\
= \int dV \, \rho(x) \\
\approx \int dV \sum_{j=1}^{N} (m_j / h_j^3) \left( \int dV \, U\left( |x(l)−x_j| / h_j \right) \right) \\
= \sum_{j=1}^{N} (m_j / h_j^3) \cdot h_j^3 \left( \int dV' \, U(r') \right),}

where I've changed coordinates to $r' = r / h_j$. Simplifying,

\mathrm{total \, mass}  =  \sum_{j=1}^{N}  m_j \cdot (\int \, dV U(r))

As long as each particle's kernel is fully contained in the volume, its mass contribution to the total should simply be $m_j$. Integrating over the whole volume of a simulation, then

\mathrm{total \, mass} =  \sum_{j=1}^{N}  m_j = \left( \int dV \, U(r) \right)  \cdot \sum_{j=1}^{N}  m_j.

In other words, $\int U(r) dV = 1$. We can use this to test the $dl$ kernel calculation by separating the volume integral into a line integral along parallel lines of sight and a surface integral perpendicular to these:

\displaylines{ 1 =  \int dV \, U(r) \\
= \int dr_{\perp} \, 2 \pi \cdot r_{\perp} \left( \int dl \, U\left( \sqrt{l^2 + r_{\perp}^2} \right) \right), }

where $r_{\perp}$ is the impact parameter. I've tested inputting these normalized impact parameters into the table interpolator, and their squared values, to calculate the inner integral, then done the outer integral over these with a simple sum to test the volume integral constraint.

import numpy as np
from yt.utilities.lib.pixelization_routines import SPHKernelInterpolationTable

kernels = ['cubic', 'quartic', 'quintic', 'wendland2', 'wendland4', 'wendland6']
redges = np.linspace(0., 1., 1000)
rcens = 0.5 * (redges[1:] + redges[:-1])
dr = np.diff(redges) 
for kernel in kernels:
     itab = SPHKernelInterpolationTable(kernel)
     linv = itab.interpolate_array(rcens)
     lintot = np.sum(linv * 2. * np.pi * rcens * dr)
     sqv = itab.interpolate_array(rcens**2)
     sqtot = np.sum(sqv * 2. * np.pi * rcens * dr)
     print(f'{kernel},\t squared input:\t {sqtot:.6f}, lin. input:\t {lintot:.6f}')

outcome

cubic,   squared input:  1.000011, lin. input:   0.300004
quartic,         squared input:  1.000017, lin. input:   0.245339
quintic,         squared input:  1.000025, lin. input:   0.207414
wendland2,       squared input:  1.000017, lin. input:   0.266672
wendland4,       squared input:  1.000027, lin. input:   0.205135
wendland6,       squared input:  1.000041, lin. input:   0.166675

These volume integrals seem to favor the squared input, like my examination of the code did. I have also checked that the outcome is the same if I use redges = np.linspace(0., 2., 2000), so the issue is not that the normalized kernel has a larger support than I thought.

Proposed fix

The fix, if I am right, would be very simple: just replace dl = itab.interpolate_array(b / hsml) * mass / dens / hsml**2 by dl = itab.interpolate_array((b / hsml)**2) * mass / dens / hsml**2. However, I wanted to make sure I got this right before messing with the code.

Version Information

Operating System: Mac OS Sonoma 14.2.1 (23C71)
Python Version: 3.11.5
yt version: 4.1.4
numpy version: 1.26.2

I installed both python and yt from miniconda, using the default channel.

welcome[bot] commented 10 months ago

Hi, and welcome to yt! Thanks for opening your first issue. We have an issue template that helps us to gather relevant information to help diagnosing and fixing the issue.

matthewturk commented 9 months ago

Hi! I've read through this in detail, and I believe that your exploration of the problem is spot on. I have one question, which is why the value is 0.30000 for the old value -- do you have a sense if that number falls out of the factor difference?

For reference I've also LaTeX'd the first few equations up so I could read them more easily:

$$A(x) \approx \sum_{j=1}^{N} A_j (m_j / \rho_j) W(|\vec{x}−\vec{x_j}|, h_j)$$

$$ W(|\vec{x}-\vec{x_j}|,h_j) \equiv U(|\vec{x}-\vec{x_j}|/h_j)/h_j^3$$

$$N = \int n(x(l))dl \approx \int dl \sum_{j=1}^{N} (m_j/\rho_j) n_j W(|\vec{x}-\vec{x_j}|, hj) = \sum{j=1}^{N}n_j (m_j/(\rho_jh_j^3))\int U(|\vec{x}-\vec{x_j}|/h_j)dl$$

The test you've applied seems precisely right to me, and given the convergence you're seeing I think your choice of parameters is good.

chummels commented 9 months ago

Was it ever identified if this same weighting was present in the ProjectionPlot infrastructure? Or if this discussion/bug is limited to just the ray data object? The ProjectionPlot infrastructure will also sample SPH kernels for each pixel acting effectively as a ray object.

matthewturk commented 9 months ago

I believe this is up for discussion on Slack right now

chummels commented 9 months ago

I have identified one potential way to test the existing weighting (impact parameter / smoothing length) versus the proposed weighting (impact parameter / smoothing length)**2 in the ray object.

The potential bug that you have identified only affects particle-based codes, not grid-based codes. So we could take a grid-based snapshot, calculate a ray on a specific trajectory through that volume to measure some field (say, density along that ray) as a sort of "gold standard". I wrote some code a few years ago which can "convert" a grid-based dataset into a particle-based dataset using monte carlo sampling: https://github.com/yt-project/yt/pull/2187 . We could use that to convert the grid-based dataset to a particle-based dataset, then send a ray down the same trajectory through the particle-based dataset. Then we could inspect the density field along that line of sight to see how well it is able to reproduce the ray from the corresponding grid-based ray. We could repeat this for the proposed weighting prescription (impact parameter / smoothing length)**2 to see if it more accurately matches the grid-based ray than the current weighting prescription (impact parameter / smoothing length). It might provide more feedback on this discussion in a practical sense.

chummels commented 9 months ago

I believe this is up for discussion on Slack right now

Yes, it is up for discussion on slack, but I wasn't sure where this discussion was supposed to be happening, either here or on slack. It seems like this may be a better place for it since it has more permanence than our slack discussions.

nastasha-w commented 9 months ago

I've brought up a possible issue with the ProjectionPlot pixelization routine on the development slack channel, but that's to do with the handling of small SPH particles (smoothing length < pixel size). In short, I think that by effectively 'stretching' small SPH particles, this routine adds more mass (line density) to the pixel sightlines than it should.

The backend function pixelize_sph_kernel_projection does call interpolate_array with (impact parameter / smoothing length)^2 as the argument, so I suppose we'll have to change something in the code either way; the Ray and ProjectionPlot approaches can't both be right.

Thanks for the LaTeXing, by the way, @matthewturk. I've fixed the rest as well; I wasn't aware we could use LaTeX math here.

nastasha-w commented 9 months ago

The small-particle issue for the SPH ProjectionPlots is a separate one, but I can open a different pull request for it. I feel like I might need to think that one through a but more first.

nastasha-w commented 9 months ago

I like @chummels test idea. One other option I was wondering about in this context is if there are functions/methods/datasets for testing very simple setups for this kind of thing, e.g., a dataset with one, or at least $\lesssim 10$ SPH particles, where we can just numpy-integrate the kernel for each particle and compare the outcomes. Such simple datasets where we either know the outcome analytically or can use a slow but very straightforward numerical calculation seem like a good basis for testing the backends in the context of the full analysis pipeline.

nastasha-w commented 9 months ago

The volume integral of 0.300 is only for the cubic kernel; other kernels have differences by other factors, so I suspect there isn't something very deep going on there.

matthewturk commented 9 months ago

@nastasha-w so there are a few simple test setups in yt already; take a look at these to see if they are super relevant:

yt/visualization/volume_rendering/test_off_axis_SPH.py
yt/data_objects/tests/test_sph_data_objects.py
yt/data_objects/tests/test_octree.py
yt/frontends/stream/tests/test_stream_particles.py (maybe)

I think test_sph_data_objects.py, which compares against analytic results for stuff like slices and regions. There's one test for rays there, but it looks like it just counts how many contribute, rather than the final values.

I believe I exchanged emails with one or more people where they conducted comparisons of different smoothing kernels and closure applications, which I will try to dig up. (Unfortunately the rest of my day is teaching, so I may have to come back to this at another time.)

nastasha-w commented 9 months ago

Thanks! I've been looking into those. One issue has already come up: in yt Ray.py, there is this bit of code:

    def _generate_container_field(self, field):
        # What should we do with `ParticleDataset`?
        if isinstance(self.ds, SPHDataset):
            return self._generate_container_field_sph(field)
        else:
            return self._generate_container_field_grid(field)

I think that question is coming up here; I can set the Ray t and dts fields manually using

ray0._generate_container_field_sph("t")
ray0._generate_container_field_sph("dts")

but trying

ray0["t"]

gives an error:

In [72]: ray0['dts']
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[72], line 1
----> 1 ray0['dts']

File ~/code/yt/yt/data_objects/data_containers.py:232, in YTDataContainer.__getitem__(self, key)
    230 if f not in self.field_data and key not in self.field_data:
    231     if f in self._container_fields:
--> 232         self.field_data[f] = self.ds.arr(self._generate_container_field(f))
    233         return self.field_data[f]
    234     else:

File ~/code/yt/yt/data_objects/selection_objects/ray.py:193, in YTRay._generate_container_field(self, field)
    191     return self._generate_container_field_sph(field)
    192 else:
--> 193     return self._generate_container_field_grid(field)

File ~/code/yt/yt/data_objects/selection_objects/ray.py:199, in YTRay._generate_container_field_grid(self, field)
    197     self.index._identify_base_chunk(self)
    198 if field == "dts":
--> 199     return self._current_chunk.dtcoords
    200 elif field == "t":
    201     return self._current_chunk.tcoords

File ~/code/yt/yt/geometry/geometry_handler.py:272, in cacheable_property.<locals>.cacheable_func(self)
    270     return getattr(self, n)
    271 if self.data_size is None:
--> 272     tr = self._accumulate_values(n[1:])
    273 else:
    274     tr = func(self)

File ~/code/yt/yt/geometry/geometry_handler.py:307, in YTDataChunk._accumulate_values(self, method)
    305 arrs = []
    306 for obj in self._fast_index or self.objs:
--> 307     f = getattr(obj, mname)
    308     arrs.append(f(self.dobj))
    309 if method == "dtcoords":

AttributeError: 'ParticleContainer' object has no attribute 'select_dtcoords'

The comment mentions the ParticleDataset class, which is the base class for some SPH datasets, but also looks like it's used for some halo catalogues; I don't know if these are catalogues for SPH(-like) simulations or whether individual halos are stores as 'particles' there. One the other hand, ParticleContainer seems to be some indexing-related subclass, that doesn't seem to be subclassed by anything... That seems to be called somewhere deep in the Ray handler for grid datasets, in a chunking operation. I don't think it's even meant to be applied to particle data.

Anyway, checking if isinstance(self.ds, ParticleDataset) instead of if isinstance(self.ds, SPHDataset) in Ray objects would seem to fix the problem, but I am worried whether this could cause some issues down the road if there are ParticleDataset objects that are not SPH-like. In that case, we'd probably want to raise some sort of error for those subclasses, or find some tell-tale attribute to check, e.g., does it have a smoothing length. (I hope nobody has tried to wrangle halo catalogues into YT by setting the smoothing length equal to the halo radius.)

jzuhone commented 3 months ago

@nastasha-w did we ever carry out the test that @chummels suggested in https://github.com/yt-project/yt/issues/4781#issuecomment-1906872232?

Regardless, I have read through this issue and your derivation and I believe that you are correct. I think it would be good to verify this empirically.

@chummels @matthewturk any other comments here? these issues have been lingering for too long and we need to resolve them.

nastasha-w commented 3 months ago

@jzuhone I did not carry out that test, no. In hindsight, I think the 'scatter' SPH gridding algorithm had an issue in it as well (#4788, #4939), so that might complicate this approach. I ended up going with a test (added to the test suite) where I calculate the kernel integrals with a separate function using numpy and compare that against the yt calculation, for a simple test dataset with one or two particles.

ngoldbaum commented 1 month ago

I believe myself and the NCSA intern I had working with me at the time are the original authors of the code you've worked through.

Thanks so much for taking the time to fix and expand this code. My hope was always that somehow who is more of an expert in SPH would find it useful and improve it.

chrishavlin commented 1 week ago

@nastasha-w did https://github.com/yt-project/yt/pull/4783 fully address this issue? can we close this issue out? I think yes, but wanted to double check.

nastasha-w commented 1 week ago

@chrishavlin yes!

chrishavlin commented 1 week ago

Closed by #4783

yt-project / yt

possible bug in Ray objects: calculation of projected lengths dl for non-grid/SPH data #4781

Bug report