yt-project / yt

Main yt repository
http://yt-project.org
Other
469 stars 280 forks source link

Filtering data with cut_region has wrong assumption of sampling type #1635

Closed kuochuanpan closed 6 years ago

kuochuanpan commented 7 years ago

Bug report

yt version: 3.5.dev0 (python3) data: FLASH data : sedov_hdf5_chk_0003

I want to cut out some region to speed up the calculation but found that yt will assume my data has particles but I am not using any particles in my data set.

Bug summary

See the below code example to reproduce the bug. my_field works but if I add some filter in my_filed2 then it crashed. However, my_field2 works with all_data() but not with cut_region.

Code for reproduction

I could reproduce this bug with the public sedov3d data set.

# Paste your code here
import yt
import numpy as np

def _test_field1(field,data):
    return data["density"]

def _test_field2(field,data):
    region = (data["density"] < 3.0)
    return data["density"][region]

fn = "Sedov_3d/sedov_hdf5_chk_0003"

ds = yt.load(fn)
ds.add_field("my_field",function=_test_field1,units="g/cm**3",display_name="My Field",sampling_type='cell')
ds.periodicity = (True, True, True)
dd  = ds.all_data()
cut = dd.cut_region(['(obj["density"] >1.0 )'])
total_mass = cut['my_field'].sum()
print(total_mass)

ds.add_field("my_field2",function=_test_field2,units="g/cm**3",display_name="My Field",sampling_type='cell')
total_mass = dd['my_field2'].sum()
print(total_mass)

ds.add_field("my_field2",function=_test_field2,units="g/cm**3",display_name="My Field",sampling_type='cell')
ds.periodicity = (True, True, True)
total_mass = cut['my_field2'].sum()
print(total_mass)

Actual outcome

yt : [INFO     ] 2017-11-28 11:44:55,787 Particle file found: sedov_hdf5_chk_0003
yt : [INFO     ] 2017-11-28 11:44:55,814 Parameters: current_time              = 0.0300280784883
yt : [INFO     ] 2017-11-28 11:44:55,814 Parameters: domain_dimensions         = [8 8 8]
yt : [INFO     ] 2017-11-28 11:44:55,814 Parameters: domain_left_edge          = [ 0.  0.  0.]
yt : [INFO     ] 2017-11-28 11:44:55,815 Parameters: domain_right_edge         = [ 1.  1.  1.]
yt : [INFO     ] 2017-11-28 11:44:55,815 Parameters: cosmological_simulation   = 0.0
187383.6681927098 g/cm**3
803464.4298256787 g/cm**3
Traceback (most recent call last):
  File "yt_debug_cutregion.py", line 24, in <module>
    total_mass = cut['my_field2'].sum()
  File "/Users/pan/anaconda/envs/python3/lib/python3.6/site-packages/yt/data_objects/data_containers.py", line 282, in __getitem__
    self.get_data(f)
  File "/Users/pan/anaconda/envs/python3/lib/python3.6/site-packages/yt/data_objects/selection_data_containers.py", line 826, in get_data
    parent[field][self._part_ind(field[0])]
  File "/Users/pan/anaconda/envs/python3/lib/python3.6/site-packages/yt/data_objects/selection_data_containers.py", line 867, in _part_ind
    parent[(ptype, "particle_position_x")].to(units),
  File "/Users/pan/anaconda/envs/python3/lib/python3.6/site-packages/yt/data_objects/data_containers.py", line 275, in __getitem__
    f = self._determine_fields([key])[0]
  File "/Users/pan/anaconda/envs/python3/lib/python3.6/site-packages/yt/data_objects/data_containers.py", line 1122, in _determine_fields
    finfo = self.ds._get_field_info(ftype, fname)
  File "/Users/pan/anaconda/envs/python3/lib/python3.6/site-packages/yt/data_objects/static_output.py", line 755, in _get_field_info
    raise YTFieldNotFound((ftype, fname), self)
yt.utilities.exceptions.YTFieldNotFound: Could not find field '('gas', 'particle_position_x')' in sedov_hdf5_chk_0003.

Expected outcome

Version Information

ngoldbaum commented 7 years ago

I'm able to reproduce this issue. Thank you for the report.

ngoldbaum commented 7 years ago

This is happening because my_field2 is not returning data of the correct shape. yt fields should always return data of the same shape as the fields that are passed into it, but my_field2 includes a boolean condition that drops some data.

Can you explain a little more what you are ultimately trying to do with my_field2? Perhaps what you really want to do is add more conditions to your cut_region?

We should probably not allow people to write fields like my_field2 since it breaks a basic assumption we are making about the shapes of data returned by yt field functions.

kuochuanpan commented 7 years ago

Thank you very much for your help. Actually what I want to do can be all done in the cut_region but my original region has a physical meaning (in my case, region not including the neutron star) and might be used to other quantities. The filter in my_field also has physical meaning. In practice, I could define more conditions in the cut_region and that should work but why my_field2 works in ds.all_data()?

ngoldbaum commented 7 years ago

The fact that ad['my_field2'] worked without raising an error is probably a bug in yt. Right now we don't do much in terms of validating that yt field functions return data with shapes that make sense, given the assumptions that exist elsewhere in yt.

I'm going to try to synthesize this discussion into another issue about validating the shapes of returned data from user-defined fields, at which point I will close this issue.

kuochuanpan commented 7 years ago

Thank you very much! Then I will just use cut_region in my problem.

ngoldbaum commented 6 years ago

Closed in favor of #2057