Closed nastasha-w closed 1 week ago
Hi, and welcome to yt! Thanks for opening your first issue. We have an issue template that helps us to gather relevant information to help diagnosing and fixing the issue.
Hi! I've read through this in detail, and I believe that your exploration of the problem is spot on. I have one question, which is why the value is 0.30000 for the old value -- do you have a sense if that number falls out of the factor difference?
For reference I've also LaTeX'd the first few equations up so I could read them more easily:
$$A(x) \approx \sum_{j=1}^{N} A_j (m_j / \rho_j) W(|\vec{x}−\vec{x_j}|, h_j)$$
$$ W(|\vec{x}-\vec{x_j}|,h_j) \equiv U(|\vec{x}-\vec{x_j}|/h_j)/h_j^3$$
$$N = \int n(x(l))dl \approx \int dl \sum_{j=1}^{N} (m_j/\rho_j) n_j W(|\vec{x}-\vec{x_j}|, hj) = \sum{j=1}^{N}n_j (m_j/(\rho_jh_j^3))\int U(|\vec{x}-\vec{x_j}|/h_j)dl$$
The test you've applied seems precisely right to me, and given the convergence you're seeing I think your choice of parameters is good.
Was it ever identified if this same weighting was present in the ProjectionPlot
infrastructure? Or if this discussion/bug is limited to just the ray
data object? The ProjectionPlot
infrastructure will also sample SPH kernels for each pixel acting effectively as a ray
object.
I believe this is up for discussion on Slack right now
I have identified one potential way to test the existing weighting (impact parameter / smoothing length)
versus the proposed weighting (impact parameter / smoothing length)**2
in the ray
object.
The potential bug that you have identified only affects particle-based codes, not grid-based codes. So we could take a grid-based snapshot, calculate a ray on a specific trajectory through that volume to measure some field (say, density
along that ray
) as a sort of "gold standard". I wrote some code a few years ago which can "convert" a grid-based dataset into a particle-based dataset using monte carlo sampling: https://github.com/yt-project/yt/pull/2187 . We could use that to convert the grid-based dataset to a particle-based dataset, then send a ray
down the same trajectory through the particle-based dataset. Then we could inspect the density
field along that line of sight to see how well it is able to reproduce the ray from the corresponding grid-based ray
. We could repeat this for the proposed weighting prescription (impact parameter / smoothing length)**2
to see if it more accurately matches the grid-based ray
than the current weighting prescription (impact parameter / smoothing length)
. It might provide more feedback on this discussion in a practical sense.
I believe this is up for discussion on Slack right now
Yes, it is up for discussion on slack, but I wasn't sure where this discussion was supposed to be happening, either here or on slack. It seems like this may be a better place for it since it has more permanence than our slack discussions.
I've brought up a possible issue with the ProjectionPlot
pixelization routine on the development slack channel, but that's to do with the handling of small SPH particles (smoothing length < pixel size). In short, I think that by effectively 'stretching' small SPH particles, this routine adds more mass (line density) to the pixel sightlines than it should.
The backend function pixelize_sph_kernel_projection
does call interpolate_array
with (impact parameter / smoothing length)^2 as the argument, so I suppose we'll have to change something in the code either way; the Ray
and ProjectionPlot
approaches can't both be right.
Thanks for the LaTeXing, by the way, @matthewturk. I've fixed the rest as well; I wasn't aware we could use LaTeX math here.
The small-particle issue for the SPH ProjectionPlot
s is a separate one, but I can open a different pull request for it. I feel like I might need to think that one through a but more first.
I like @chummels test idea. One other option I was wondering about in this context is if there are functions/methods/datasets for testing very simple setups for this kind of thing, e.g., a dataset with one, or at least $\lesssim 10$ SPH particles, where we can just numpy-integrate the kernel for each particle and compare the outcomes. Such simple datasets where we either know the outcome analytically or can use a slow but very straightforward numerical calculation seem like a good basis for testing the backends in the context of the full analysis pipeline.
The volume integral of 0.300 is only for the cubic kernel; other kernels have differences by other factors, so I suspect there isn't something very deep going on there.
@nastasha-w so there are a few simple test setups in yt already; take a look at these to see if they are super relevant:
yt/visualization/volume_rendering/test_off_axis_SPH.py
yt/data_objects/tests/test_sph_data_objects.py
yt/data_objects/tests/test_octree.py
yt/frontends/stream/tests/test_stream_particles.py
(maybe)I think test_sph_data_objects.py
, which compares against analytic results for stuff like slices and regions. There's one test for rays there, but it looks like it just counts how many contribute, rather than the final values.
I believe I exchanged emails with one or more people where they conducted comparisons of different smoothing kernels and closure applications, which I will try to dig up. (Unfortunately the rest of my day is teaching, so I may have to come back to this at another time.)
Thanks! I've been looking into those. One issue has already come up: in yt Ray.py, there is this bit of code:
def _generate_container_field(self, field):
# What should we do with `ParticleDataset`?
if isinstance(self.ds, SPHDataset):
return self._generate_container_field_sph(field)
else:
return self._generate_container_field_grid(field)
I think that question is coming up here; I can set the Ray
t
and dts
fields manually using
ray0._generate_container_field_sph("t")
ray0._generate_container_field_sph("dts")
but trying
ray0["t"]
gives an error:
In [72]: ray0['dts']
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[72], line 1
----> 1 ray0['dts']
File ~/code/yt/yt/data_objects/data_containers.py:232, in YTDataContainer.__getitem__(self, key)
230 if f not in self.field_data and key not in self.field_data:
231 if f in self._container_fields:
--> 232 self.field_data[f] = self.ds.arr(self._generate_container_field(f))
233 return self.field_data[f]
234 else:
File ~/code/yt/yt/data_objects/selection_objects/ray.py:193, in YTRay._generate_container_field(self, field)
191 return self._generate_container_field_sph(field)
192 else:
--> 193 return self._generate_container_field_grid(field)
File ~/code/yt/yt/data_objects/selection_objects/ray.py:199, in YTRay._generate_container_field_grid(self, field)
197 self.index._identify_base_chunk(self)
198 if field == "dts":
--> 199 return self._current_chunk.dtcoords
200 elif field == "t":
201 return self._current_chunk.tcoords
File ~/code/yt/yt/geometry/geometry_handler.py:272, in cacheable_property.<locals>.cacheable_func(self)
270 return getattr(self, n)
271 if self.data_size is None:
--> 272 tr = self._accumulate_values(n[1:])
273 else:
274 tr = func(self)
File ~/code/yt/yt/geometry/geometry_handler.py:307, in YTDataChunk._accumulate_values(self, method)
305 arrs = []
306 for obj in self._fast_index or self.objs:
--> 307 f = getattr(obj, mname)
308 arrs.append(f(self.dobj))
309 if method == "dtcoords":
AttributeError: 'ParticleContainer' object has no attribute 'select_dtcoords'
The comment mentions the ParticleDataset
class, which is the base class for some SPH datasets, but also looks like it's used for some halo catalogues; I don't know if these are catalogues for SPH(-like) simulations or whether individual halos are stores as 'particles' there. One the other hand, ParticleContainer
seems to be some indexing-related subclass, that doesn't seem to be subclassed by anything... That seems to be called somewhere deep in the Ray
handler for grid datasets, in a chunking operation. I don't think it's even meant to be applied to particle data.
Anyway, checking if isinstance(self.ds, ParticleDataset)
instead of if isinstance(self.ds, SPHDataset)
in Ray
objects would seem to fix the problem, but I am worried whether this could cause some issues down the road if there are ParticleDataset
objects that are not SPH-like. In that case, we'd probably want to raise some sort of error for those subclasses, or find some tell-tale attribute to check, e.g., does it have a smoothing length. (I hope nobody has tried to wrangle halo catalogues into YT by setting the smoothing length equal to the halo radius.)
@nastasha-w did we ever carry out the test that @chummels suggested in https://github.com/yt-project/yt/issues/4781#issuecomment-1906872232?
Regardless, I have read through this issue and your derivation and I believe that you are correct. I think it would be good to verify this empirically.
@chummels @matthewturk any other comments here? these issues have been lingering for too long and we need to resolve them.
@jzuhone I did not carry out that test, no. In hindsight, I think the 'scatter' SPH gridding algorithm had an issue in it as well (#4788, #4939), so that might complicate this approach. I ended up going with a test (added to the test suite) where I calculate the kernel integrals with a separate function using numpy and compare that against the yt
calculation, for a simple test dataset with one or two particles.
I believe myself and the NCSA intern I had working with me at the time are the original authors of the code you've worked through.
Thanks so much for taking the time to fix and expand this code. My hope was always that somehow who is more of an expert in SPH would find it useful and improve it.
@nastasha-w did https://github.com/yt-project/yt/pull/4783 fully address this issue? can we close this issue out? I think yes, but wanted to double check.
@chrishavlin yes!
Closed by #4783
Bug report
Bug summary
There is a possible bug in the calculation of lengths dl in Ray objects for SPH and other non-grid data. I think the Ray object is inputting (impact parameter / smoothing length) into a cython function where it should be inputting (impact parameter / smoothing length)^2. This would mean lengths dl, and derived quantities like surface densities or column densities are systematically underestimated.
Bug reasoning
As far as I can tell, the under-the-hood code to calculate a resolution element’s projection onto a pencil beam (calculation of dl in a Ray object) comes from the SPLASH paper, which I think is this one: SPLASH paper?. From that paper, the basic equation (eq. 3) to start from is that in an SPH simulation, a field A(x) is sampled by SPH particles such that
where x is the position (vector), j denotes the particle index, W is the kernel function, and h is the smoothing length. In practice, the kernel only depends on $|x - x_j| / h_j$, and we can write a dimensionless kernel $U(r)$, where $r$ is the normalized distance to the particle center, such that (eq. 8) $W(|x−x_j|, h_j) = U(|x−x_j| / h_j) / h_j^3$ (for 3-dimensional data).
To calculate a column density $N$, we want to integrate a density field $n(x)$ along some line of sight $l$. This means
where $x(l)$ is the path through the volume. For the straight line paths of the ray object, $x(l) = x_0 + l \hat{x}$, where $\hat{x}$ is a unit vector describing the direction of the path. Examining an integral for one particle, we can set the zero point $x_0$ where the path is closest to $x_j$, at impact parameter $b$, and set $x_j$ as the center of the coordinate system. Then
where $r = l / h_j$. We then get
This integral $\int U\left( \sqrt{(b/h)^2 + r^2} \right) dr$ depends only on the normalized impact parameter $b/h$, so in yt, https://github.com/yt-project/yt/blob/main/yt/utilities/lib/pixelization_routines.pyx, line 1024 onward, the class
SPHKernelInterpolationTable
is created, which calculates a grid of these integrals for a given kernel, and then uses linear interpolation to quickly calculate these values for many SPH particles. This class seems to interpolate between squared, normalized impact parametersq2
, e.g. in this trapezoidal integral calculation:No squares/square roots seem to be taken when storing the
q2
values and their integral results in a table, or when the input value ofinterpolate_array(self, np.float64_t[:] q2_vals)
is compared to the table values. Thepixelize_sph_kernel_projection
function, also in https://github.com/yt-project/yt/blob/main/yt/utilities/lib/pixelization_routines.pyx, does input (impact parameter / smoothing length)^2 intointerpolate_array
in its projection routine. However, in the yt Ray object (https://github.com/yt-project/yt/blob/main/yt/data_objects/selection_objects/ray.py), the_generate_container_field_sph
function for the fielddts
(dl
/ total length of the Ray) calls interpolate_array asThese $dl$ values, as used in e.g., Trident to calculate column densities from particle ion densities, are supposed to be the
factors that determine each particle's contribution to the total, so that
It is therefore a pretty serious problem if these factors are not calculated right.
Code test
To check whether these integrals make sense, I did a simple test making use of the volume integral of the kernel. Going back to the initial definition, and doing a volume integral of the density:
where I've changed coordinates to $r' = r / h_j$. Simplifying,
As long as each particle's kernel is fully contained in the volume, its mass contribution to the total should simply be $m_j$. Integrating over the whole volume of a simulation, then
In other words, $\int U(r) dV = 1$. We can use this to test the $dl$ kernel calculation by separating the volume integral into a line integral along parallel lines of sight and a surface integral perpendicular to these:
where $r_{\perp}$ is the impact parameter. I've tested inputting these normalized impact parameters into the table interpolator, and their squared values, to calculate the inner integral, then done the outer integral over these with a simple sum to test the volume integral constraint.
outcome
These volume integrals seem to favor the squared input, like my examination of the code did. I have also checked that the outcome is the same if I use
redges = np.linspace(0., 2., 2000)
, so the issue is not that the normalized kernel has a larger support than I thought.Proposed fix
The fix, if I am right, would be very simple: just replace
dl = itab.interpolate_array(b / hsml) * mass / dens / hsml**2
bydl = itab.interpolate_array((b / hsml)**2) * mass / dens / hsml**2
. However, I wanted to make sure I got this right before messing with the code.Version Information
I installed both python and yt from miniconda, using the default channel.