yt-project / yt

Main yt repository
http://yt-project.org
Other
468 stars 280 forks source link

Unable to load ARTData quickly #5057

Open yohad opened 2 days ago

yohad commented 2 days ago

Bug report

I'm trying to use skip_particles when running yt.load as I only need the gas data and the loading gets very long. The loading fails, but when I run it without skip_particles I can load just fine

Code for reproduction

In [21]: df = yt.load("/sci/labs/dekel/lab_share/vela_v2/thin/VELA_v2_07/10MpcBox_csf512_00540.d", skip_particles=True)
yt : [INFO     ] 2024-11-13 18:24:16,081 discovered particle_header:/sci/labs/dekel/lab_share/vela_v2/thin/VELA_v2_07/PMcrd_00540.DAT
yt : [INFO     ] 2024-11-13 18:24:16,082 discovered particle_data:/sci/labs/dekel/lab_share/vela_v2/thin/VELA_v2_07/PMcrs0_00540.DAT
yt : [INFO     ] 2024-11-13 18:24:16,082 discovered particle_stars:/sci/labs/dekel/lab_share/vela_v2/thin/VELA_v2_07/stars_00540.dat
yt : [INFO     ] 2024-11-13 18:24:16,087 Using root level of 14
yt : [INFO     ] 2024-11-13 18:24:16,135 Max level is 12
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[21], line 1
----> 1 df = yt.load("/sci/labs/dekel/lab_share/vela_v2/thin/VELA_v2_07/10MpcBox_csf512_00540.d", skip_particles=True)

File ~/miniforge3/lib/python3.12/site-packages/yt/_maintenance/deprecation.py:69, in future_positional_only.<locals>.outer.<locals>.inner(*args, **kwargs)
     60     value = kwargs[name]
     61     issue_deprecation_warning(
     62         f"Using the {name!r} argument as keyword (on position {no}) "
     63         "is deprecated. "
   (...)
     67         **depr_kwargs,
     68     )
---> 69 return func(*args, **kwargs)

File ~/miniforge3/lib/python3.12/site-packages/yt/loaders.py:144, in load(fn, hint, *args, **kwargs)
    136     if missing := cls._missing_load_requirements():
    137         warnings.warn(
    138             f"This dataset appears to be of type {cls.__name__}, "
    139             "but the following requirements are currently missing: "
   (...)
    142             stacklevel=3,
    143         )
--> 144     return cls(fn, *args, **kwargs)
    146 if len(candidates) > 1:
    147     raise YTAmbiguousDataType(_input_fn, candidates)

File ~/miniforge3/lib/python3.12/site-packages/yt/frontends/art/data_structures.py:172, in ARTDataset.__init__(self, filename, dataset_type, fields, storage_filename, skip_particles, skip_stars, limit_level, spread_age, force_max_level, file_particle_header, file_particle_data, file_particle_stars, units_override, unit_system, default_species_fields)
    170 self.force_max_level = force_max_level
    171 self.spread_age = spread_age
--> 172 Dataset.__init__(
    173     self,
    174     filename,
    175     dataset_type,
    176     units_override=units_override,
    177     unit_system=unit_system,
    178     default_species_fields=default_species_fields,
    179 )
    180 self.storage_filename = storage_filename

File ~/miniforge3/lib/python3.12/site-packages/yt/data_objects/static_output.py:294, in Dataset.__init__(self, filename, dataset_type, units_override, unit_system, default_species_fields, axis_order)
    291 self._create_unit_registry(used_unit_system)
    293 self._parse_parameter_file()
--> 294 self.set_units()
    295 self.setup_cosmology()
    296 self._assign_unit_system(unit_system)

File ~/miniforge3/lib/python3.12/site-packages/yt/data_objects/static_output.py:1405, in Dataset.set_units(self)
   1396             self.unit_registry.add(
   1397                 new_unit,
   1398                 my_u.base_value / (1 + self.current_redshift),
   (...)
   1401                 prefixable=True,
   1402             )
   1403         self.unit_registry.modify("a", 1 / (1 + self.current_redshift))
-> 1405 self.set_code_units()

File ~/miniforge3/lib/python3.12/site-packages/yt/data_objects/static_output.py:1460, in Dataset.set_code_units(self)
   1457 self._override_code_units()
   1459 # set attributes like ds.length_unit
-> 1460 self._set_code_unit_attributes()
   1462 self.unit_registry.modify("code_length", self.length_unit)
   1463 self.unit_registry.modify("code_mass", self.mass_unit)

File ~/miniforge3/lib/python3.12/site-packages/yt/frontends/art/data_structures.py:226, in ARTDataset._set_code_unit_attributes(self)
    224 # all other units
    225 Om0 = self.parameters["Om0"]
--> 226 ng = self.parameters["ng"]
    227 boxh = self.parameters["boxh"]
    228 aexpn = self.parameters["aexpn"]

KeyError: 'ng'
import yt
yt.load(PATH, skip_particles=True)

Version Information

neutrinoceros commented 1 day ago

thanks for reporting Here's a reusable version of the reproducer

import yt
yt.load_sample('D9p_500', skip_particles=True)
neutrinoceros commented 1 day ago

So the bug is pretty simple to describe self.parameters["ng"] is only defined on the following line if particles aren't skipped https://github.com/yt-project/yt/blob/9e483ce13c2737e4d1d2987c436c73917f7e9b47/yt/frontends/art/data_structures.py#L347

However, how to fix it is not (to me): As we can see, the value of this parameter is heavily relied on to define the units https://github.com/yt-project/yt/blob/9e483ce13c2737e4d1d2987c436c73917f7e9b47/yt/frontends/art/data_structures.py#L226-L236 (the good news is that ng is the only such parameter that gets skipped with particles)

How should we address this ? I can see two options: 1) Would it make sense to have a default value for ng when particles aren't read (and if so, what should this default value be) ? 2) should we still read it from file even with skip_particles=True instead ?

Option 1 seems dangerous to me (given, I don't know what ng stands for). Option 2 is probably better but may not be practical: I haven't studied the logic in detail; maybe we can't read this value and really skip reading particles, in which case it would make more sense to go with option 3: deprecate skip_particles as broken and unfixable (not my favorite option).

chrishavlin commented 1 day ago

For what it's worth, at least with 'D9p_500', up in the section where it reads the amr header info these two lines:

https://github.com/yt-project/yt/blob/9e483ce13c2737e4d1d2987c436c73917f7e9b47/yt/frontends/art/data_structures.py#L262-L264

match the values that are read from the particle header:

https://github.com/yt-project/yt/blob/9e483ce13c2737e4d1d2987c436c73917f7e9b47/yt/frontends/art/data_structures.py#L347-L348

that intermediate est in the amr header section gets printed in a debug log, so comparing the calculated values in the amr header section to those in the particle header section:

import yt
yt.set_log_level('debug')
ds = yt.load_sample('D9p_500')
print(f"n cell from amr header = {ds.ncell}")
print(f"particle header parameters ng={ds.parameters['ng']}, ncell0={ds.parameters['ncell0']}")

skipping unrelated logs...

yt : [DEBUG    ] 2024-11-14 10:17:37,775 Estimating 128 cells on a root grid side, 262144 root octs
n cell from amr header = 2097152
particle header parameters ng=128, ncell0=2097152

so based on that and the name of ng in the particle header (Ngridc), i'd hazard a guess that ng is an amr grid-related parameter that happens to only exist in the particle header? and it could be safely set by est from the amr header section if it's not reading particles? But it'd be nice to have confirmation of that from someone more familiar with art outputs....