yt-project / yt

Main yt repository
http://yt-project.org
Other
469 stars 280 forks source link

Tipsy Frontend BUG?: oddities with smoothing lengths/positions and bounding box #4963

Open nastasha-w opened 3 months ago

nastasha-w commented 3 months ago

Bug report

Bug summary

For some reason, the Tipsy dataset (YT Hub) doesn't have the ("Gas", "smoothing_length") or ( "gas", "smoothing_length") field available if I load it without a bounding_box argument. This is even though even though it's in the dataset's ds.field_list. In addition, it seems like the position of at least one SPH particle depends on what bounding_box argument the dataset is loaded with.

These issues first came up in the discussion of #4939.

Code for reproduction: missing smoothing_length

(ytdev_main) nastasha@dhcp-165-124-148-182 code % ipython
Python 3.12.4 (main, Jun  6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.26.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: sampledatadir = "/Users/nastasha/code/ytdev_misc/sample_data/"

In [2]: datasetn = sampledatadir + "TipsyGalaxy/galaxy.00300"

In [3]: import yt

In [4]: import numpy as np

In [5]: ds = yt.load(datasetn, bounding_box=None)
yt : [INFO     ] 2024-08-12 13:24:31,222 Parameters: current_time              = 20.100000000000044
yt : [INFO     ] 2024-08-12 13:24:31,222 Parameters: domain_dimensions         = [1 1 1]
yt : [INFO     ] 2024-08-12 13:24:31,222 Parameters: domain_left_edge          = None
yt : [INFO     ] 2024-08-12 13:24:31,222 Parameters: domain_right_edge         = None
yt : [INFO     ] 2024-08-12 13:24:31,222 Parameters: cosmological_simulation   = 0.0
yt : [INFO     ] 2024-08-12 13:24:31,232 Allocating KDTree for 76962 particles
Generate smoothing length: 100%|███████████████████████████▉| 76961/76962 [00:00<00:00, 102496.20it/s]
yt : [INFO     ] 2024-08-12 13:24:32,044 Allocating for 3.154e+05 particles

In [7]: ds.field_list
Out[7]: 
[('DarkMatter', 'Coordinates'),
 ('DarkMatter', 'Epsilon'),
 ('DarkMatter', 'FeMassFrac'),
 ('DarkMatter', 'Mass'),
 ('DarkMatter', 'Phi'),
 ('DarkMatter', 'Velocities'),
 ('Gas', 'Coordinates'),
 ('Gas', 'Density'),
 ('Gas', 'Epsilon'),
 ('Gas', 'FeMassFrac'),
 ('Gas', 'Mass'),
 ('Gas', 'Metals'),
 ('Gas', 'Phi'),
 ('Gas', 'Temperature'),
 ('Gas', 'Velocities'),
 ('Gas', 'smoothing_length'),
 ('Stars', 'Coordinates'),
 ('Stars', 'Epsilon'),
 ('Stars', 'FeMassFrac'),
 ('Stars', 'FormationTime'),
 ('Stars', 'Mass'),
 ('Stars', 'Metals'),
 ('Stars', 'Phi'),
 ('Stars', 'Velocities'),
 ('all', 'Coordinates'),
 ('all', 'Epsilon'),
 ('all', 'FeMassFrac'),
 ('all', 'Mass'),
 ('all', 'Phi'),
 ('all', 'Velocities'),
 ('nbody', 'Coordinates'),
 ('nbody', 'Epsilon'),
 ('nbody', 'FeMassFrac'),
 ('nbody', 'Mass'),
 ('nbody', 'Phi'),
 ('nbody', 'Velocities')]

In [8]: dd = ds.all_data()

In [9]: dd[('Gas', 'x')]
Out[9]: 
unyt_array([-9.61775350e+21,  1.59519763e+22,  1.59670490e+22, ...,
       -3.19570934e+20, -5.95821781e+20, -8.65681043e+21], 'cm')

In [10]: dd[('Gas', 'smoothing_length')]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 1
----> 1 dd[('Gas', 'smoothing_length')]

File ~/code/venvs/ytdev_main/lib/python3.12/site-packages/yt/data_objects/data_containers.py:239, in YTDataContainer.__getitem__(self, key)
    237         return self.field_data[f]
    238     else:
--> 239         self.get_data(f)
    240 # fi.units is the unit expression string. We depend on the registry
    241 # hanging off the dataset to define this unit object.
    242 # Note that this is less succinct so that we can account for the case
    243 # when there are, for example, no elements in the object.
    244 try:

File ~/code/venvs/ytdev_main/lib/python3.12/site-packages/yt/data_objects/selection_objects/data_selection_objects.py:211, in YTSelectionContainer.get_data(self, fields)
    208     self.field_data[f] = self.ds.arr(v, units=finfos[f].units)
    209     self.field_data[f].convert_to_units(finfos[f].output_units)
--> 211 read_particles, gen_particles = self.index._read_particle_fields(
    212     particles, self, self._current_chunk
    213 )
    215 for f, v in read_particles.items():
    216     self.field_data[f] = self.ds.arr(v, units=finfos[f].units)

File ~/code/venvs/ytdev_main/lib/python3.12/site-packages/yt/geometry/geometry_handler.py:224, in Index._read_particle_fields(self, fields, dobj, chunk)
    222     self._identify_base_chunk(dobj)
    223 chunks = self._chunk_io(dobj, cache=False)
--> 224 fields_to_return = self.io._read_particle_selection(
    225     chunks, selector, fields_to_read
    226 )
    227 return fields_to_return, fields_to_generate

File ~/code/venvs/ytdev_main/lib/python3.12/site-packages/yt/utilities/io_handler.py:194, in BaseIOHandler._read_particle_selection(self, chunks, selector, fields)
    191     data[field] = []
    193 # Now we read.
--> 194 for field_r, vals in self._read_particle_fields(chunks, ptf, selector):
    195     # Note that we now need to check the mappings
    196     for field_f in field_maps[field_r]:
    197         data[field_f].append(vals)

File ~/code/venvs/ytdev_main/lib/python3.12/site-packages/yt/utilities/io_handler.py:222, in BaseIOHandler._read_particle_fields(self, chunks, ptf, selector)
    219 def _read_particle_fields(self, chunks, ptf, selector):
    220     # Now we have all the sizes, and we can allocate
    221     for data_file in self._sorted_chunk_iterator(chunks):
--> 222         data_file_data = self._read_particle_data_file(data_file, ptf, selector)
    223         # temporary trickery so it's still an iterator, need to adjust
    224         # the io_handler.BaseIOHandler.read_particle_selection() method
    225         # to not use an iterator.
    226         yield from data_file_data.items()

File ~/code/venvs/ytdev_main/lib/python3.12/site-packages/yt/frontends/tipsy/io.py:213, in IOHandlerTipsyBinary._read_particle_data_file(self, data_file, ptf, selector)
    211 if mask is None:
    212     continue
--> 213 tf = self._fill_fields(field_list, p, hsml, mask, data_file)
    214 for field in field_list:
    215     return_data[ptype, field] = tf.pop(field)

File ~/code/venvs/ytdev_main/lib/python3.12/site-packages/yt/frontends/tipsy/io.py:61, in IOHandlerTipsyBinary._fill_fields(self, fields, vals, hsml, mask, data_file)
     59     size = 0
     60 elif isinstance(mask, slice):
---> 61     size = vals[fields[0]].size
     62 else:
     63     size = mask.sum()

ValueError: no field of name smoothing_length

What does work

Especially given that ('Gas', 'smoothing_length') is listed in ds.field_list, I expected dd[('Gas', 'smoothing_length')] to return a unyt array of smoothing lengths, just like dd[('Gas', 'x')] returns a unyt array of x-coordinates. However, if I load the same dataset with a bounding_box argument, I do seem to get results:

(ytdev_main) nastasha@dhcp-165-124-148-182 code % ipython
Python 3.12.4 (main, Jun  6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.26.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: sampledatadir = "/Users/nastasha/code/ytdev_misc/sample_data/"

In [2]: datasetn = sampledatadir + "TipsyGalaxy/galaxy.00300"

In [3]: import yt

In [4]: ds = yt.load(datasetn)
yt : [INFO     ] 2024-08-12 13:39:51,968 Parameters: current_time              = 20.100000000000044
yt : [INFO     ] 2024-08-12 13:39:51,968 Parameters: domain_dimensions         = [1 1 1]
yt : [INFO     ] 2024-08-12 13:39:51,969 Parameters: domain_left_edge          = None
yt : [INFO     ] 2024-08-12 13:39:51,969 Parameters: domain_right_edge         = None
yt : [INFO     ] 2024-08-12 13:39:51,969 Parameters: cosmological_simulation   = 0.0
yt : [INFO     ] 2024-08-12 13:39:51,978 Allocating KDTree for 76962 particles
Generate smoothing length: 100%|███████████████████████████▉| 76961/76962 [00:00<00:00, 101941.88it/s]
yt : [INFO     ] 2024-08-12 13:39:52,793 Allocating for 3.154e+05 particles

In [5]: ds = yt.load(datasetn, bounding_box=[[-2e3, 2e3],] * 3)
yt : [INFO     ] 2024-08-12 13:40:03,736 Parameters: current_time              = 20.100000000000044
yt : [INFO     ] 2024-08-12 13:40:03,736 Parameters: domain_dimensions         = [1 1 1]
yt : [INFO     ] 2024-08-12 13:40:03,736 Parameters: domain_left_edge          = [-2000. -2000. -2000.]
yt : [INFO     ] 2024-08-12 13:40:03,736 Parameters: domain_right_edge         = [2000. 2000. 2000.]
yt : [INFO     ] 2024-08-12 13:40:03,736 Parameters: cosmological_simulation   = 0.0

In [6]: dd = ds.all_data()
yt : [INFO     ] 2024-08-12 13:40:17,464 Allocating for 3.154e+05 particles

In [7]: dd[('Gas', 'smoothing_length')]
Out[7]: 
unyt_array([0.7857734 , 0.30356586, 0.21134987, ..., 0.10337678, 0.16326839,
       0.19400276], 'code_length')

I had to load the dataset without a bounding box first, otherwise I got a segfault like in #4961.

Code for reproduction: particle position depends on bounding_box

Using the fact that the 'weird' particle seemed to be the one with the smallest (most negative) z-coordinate (these images):

(ytdev_main) nastasha@dhcp-165-124-148-182 ytdev_misc % ipython
Python 3.12.4 (main, Jun  6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.26.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import yt

In [2]: import numpy as np

In [3]: sampledatadir = "/Users/nastasha/code/ytdev_misc/sample_data/"

In [4]: datasetn = sampledatadir + "TipsyGalaxy/galaxy.00300"

In [5]: ds = yt.load(datasetn, bounding_box=[[-1e5, 1e5],] * 3)
yt : [INFO     ] 2024-08-12 14:03:51,565 Parameters: current_time              = 20.100000000000044
yt : [INFO     ] 2024-08-12 14:03:51,565 Parameters: domain_dimensions         = [1 1 1]
yt : [INFO     ] 2024-08-12 14:03:51,565 Parameters: domain_left_edge          = [-100000. -100000. -100000.]
yt : [INFO     ] 2024-08-12 14:03:51,565 Parameters: domain_right_edge         = [100000. 100000. 100000.]
yt : [INFO     ] 2024-08-12 14:03:51,565 Parameters: cosmological_simulation   = 0.0

In [6]: dd = ds.all_data()
yt : [INFO     ] 2024-08-12 14:03:55,448 Allocating for 3.154e+05 particles

In [7]: ind = np.argmin(dd[("Gas", "z")])

In [8]: (dd[("Gas", "position")][ind]).to("kpc")
Out[8]: unyt_array([ -510.06057739,   232.48110962, -3623.55981445], 'kpc')

In [9]: ds = yt.load(datasetn, bounding_box=[[-2e3, 2e3],] * 3)
yt : [INFO     ] 2024-08-12 14:04:43,194 Parameters: current_time              = 20.100000000000044
yt : [INFO     ] 2024-08-12 14:04:43,194 Parameters: domain_dimensions         = [1 1 1]
yt : [INFO     ] 2024-08-12 14:04:43,195 Parameters: domain_left_edge          = [-2000. -2000. -2000.]
yt : [INFO     ] 2024-08-12 14:04:43,195 Parameters: domain_right_edge         = [2000. 2000. 2000.]
yt : [INFO     ] 2024-08-12 14:04:43,195 Parameters: cosmological_simulation   = 0.0

In [10]: dd = ds.all_data()
yt : [INFO     ] 2024-08-12 14:04:46,291 Allocating for 3.154e+05 particles

In [11]: ind = np.argmin(dd[("Gas", "z")])

In [12]: (dd[("Gas", "position")][ind]).to("kpc")
Out[12]: unyt_array([ -510.06057739,   232.48110962, -2000.        ], 'kpc')

In [13]: ds = yt.load(datasetn, bounding_box=[[-3e3, 3e3],] * 3)
yt : [INFO     ] 2024-08-12 14:05:15,860 Parameters: current_time              = 20.100000000000044
yt : [INFO     ] 2024-08-12 14:05:15,860 Parameters: domain_dimensions         = [1 1 1]
yt : [INFO     ] 2024-08-12 14:05:15,860 Parameters: domain_left_edge          = [-3000. -3000. -3000.]
yt : [INFO     ] 2024-08-12 14:05:15,860 Parameters: domain_right_edge         = [3000. 3000. 3000.]
yt : [INFO     ] 2024-08-12 14:05:15,860 Parameters: cosmological_simulation   = 0.0

In [14]: dd = ds.all_data()
yt : [INFO     ] 2024-08-12 14:05:18,249 Allocating for 3.154e+05 particles

In [15]: ind = np.argmin(dd[("Gas", "z")])

In [16]: (dd[("Gas", "position")][ind]).to("kpc")
Out[16]: unyt_array([ -510.06057739,   232.48110962, -3000.        ], 'kpc')

It seems like the particle position is more specifically being 'clipped' to fall within the bounding_box. This particle has a large smoothing length (3645 kpc), so it should probably be included in the region selection based on these bounding boxes. However, I suspect that changing the particle's position is not intended behaviour.

Version Information

chrishavlin commented 3 months ago

4964 should fix the error when the bbox is None.

For the changing coordinate values, they are indeed being clipped here:

https://github.com/yt-project/yt/blob/39a7563e64e8ec1e8d12ea3af2cc6da5354419d1/yt/frontends/tipsy/io.py#L81-L88

but I'm not sure why...

nastasha-w commented 3 months ago

4964 should fix the error when the bbox is None.

For the changing coordinate values, they are indeed being clipped here:

https://github.com/yt-project/yt/blob/39a7563e64e8ec1e8d12ea3af2cc6da5354419d1/yt/frontends/tipsy/io.py#L81-L88

but I'm not sure why...

Thanks for taking a look! That coordinate clipping seems weird, like something kind of hacky you put in to avoid other problems. Do you have any idea who might have written that?

chrishavlin commented 3 months ago

I'd guess it was likely added to avoid a particle falling identically on the bounding box of the dataset (maybe to address some particle selection issue?). I think you're only seeing it manifest in this extreme way because you're overriding the bounding box with a value that's very different from the actual bounding box of the dataset - the bounding_box parameter is not meant to be used as a subselection method and should always match the domain extent of the full dataset.

nastasha-w commented 3 months ago

Ah, ok! As for the bounding box argument: I just copied it over from the arguments used in the image test. Maybe Tipsy wasn't formally run with a bounding box? I don't think you formally need to impose those in SPH, and for an isolated galaxy simulation, you wouldn't have the large-scale/low-resolution part to get the large-scale gravity right.

chrishavlin commented 3 months ago

As for the bounding box argument: I just copied it over from the arguments used in the image test.

oh interesting! I'll go take a look at that to see if I can work out why.

chrishavlin commented 3 months ago

Maybe Tipsy wasn't formally run with a bounding box?

Ya! I think that's right -- there's actually a comment in the TipsyDataset.__init__ that reads

        # Because Tipsy outputs don't have a fixed domain boundary, one can
        # specify a bounding box which effectively gives a domain_left_edge
        # and domain_right_edge

but i'm still a bit confused by it. If you don't set a bounding box, it will go through and find the min/max of all the particle coordinate positions and calculate a corresponding bounding box (here). But it's really not clear to me what should happen if you specify a bounding box that is smaller than the extent of the particle spread... the current behavior is what you're seeing: the particles falling outside the specified bounding box are being clipped to the bounding box, which doesn't seem right to me but I might be missing some context.

nastasha-w commented 3 months ago

@chrishavlin my instinct is that if the bounding box is too small, the code should just raise an error, rather than silently mangling the data. I've never worked with Tipsy data outside those tests for the SPH backend projections though, so I'd defer to someone more familiar with the dataset/simulation and typical underlying assumptions.

chrishavlin commented 3 months ago

Ya, I agree with you -- raising an error or at least a warning would be ideal. I'll hit slack to see if anyone more familiar with Tipsy can weigh in.

nastasha-w commented 2 weeks ago

@chrishavlin I checked and I don't think anyone actually responded to this on the slack channel. Do we want to go ahead with making the changes we favor, or just leave the issue open for documentation purposes in case it comes up in the future?