pynbody / tangos

The Agile Numerical Galaxy Organisation System
BSD 3-Clause "New" or "Revised" License
18 stars 13 forks source link

finder_id problem for MPI AHF runs #87

Open mtremmel opened 5 years ago

mtremmel commented 5 years ago

the finder_id is meant to map to the original order of the halo catalog, not the unique ID number assigned by the halo finder necessarily. For MPI runs (e.g. with AHF but also similar for others) the halo ID is a very large number not associated with its final position within the halo catalog (i.e. there is no 1, but rather very large identifying numbers). In pynbody, the halo catalog is read in as a list of halos and halos are extracted based on their position within that list (h.load_copy(5) will load in the 5th halo, for example). A while ago, the definition of finder_id was changed such that it was associated directly with the ID number given in the catalog. This works fine for normal AHF where the order = ID number. However, this will create problems for catalogs created with MPI, where this is not the case.

mtremmel commented 5 years ago

the problem I think comes from this commit https://github.com/pynbody/tangos/commit/7a4d78d58177fb33276649c97b5823920c248c7b

mtremmel commented 5 years ago

I think the best way forward with this is to change the output from the halo finder specific input handler such that finder_id maps to the catalog position rather than the ID number. This allows for, I think, easier fixes/changes to be made to other catalogs

apontzen commented 5 years ago

The trouble is then we have three layers of IDs - the halo_number, the finder_id, and this new “sometimes the finder_id and sometimes something else” thing... is there a cleaner solution?

mtremmel commented 5 years ago

sorry I wasn't clear... my proposal is to keep just those two, but have the AHF input_handler output the "correct" finder_id rather than the one directly taken from the stat file. Because this is how the AHF catalog works in pynbody. finder_id must map to the halo such that h.load_copy(finder_id) returns the desired halo. right now this would not be the case for MPI AHF.

mtremmel commented 5 years ago

the problem that I fore see is that the "correct" finder_id might depend on the handler you use (pynbody vs yt, etc)

apontzen commented 5 years ago

I guess I am not following what is the ‘correct’ finder_id if not the one actually listed in the AHF stat file? It seems more like a bug in AHF than in tangos...

mtremmel commented 5 years ago

Note that this dependence I think already exists given the "id_offset" defined for AHF catalogs... that is clearly (I think?) specific to pynbody (i.e. ID 0 -> 1 and so on). presumably there could exist an input handler for particle data that used the original IDs rather than ID+1.

mtremmel commented 5 years ago

For halo finders run with MPI the ID of the halo is disconnected from the position within the file, because the latter is not known a priori. Each separate process writes out their halo information to the file separately so the final ordering really is just which process writes out the information first. In pynbody catalogs at least, for AHF, the order within the catalog is what matters. h.load_copy(N) grabs the Nth entry in the list of halos, not halo with ID number N+1, though this is what happens in the "normal" (non-MPI) AHF output.

mtremmel commented 5 years ago

I suppose the question is whether we should change the AHF reader to handle the raw ID numbers or change tangos to just always use the raw order within the catalog.

apontzen commented 5 years ago

Argh! It’s a horrible thing.

Basically my preferred solution would be to fix AHF so it post-processes and renumbers its halos to something a bit more sensible. But let’s assume that’s not going to happen in which case yes I agree the finder_id should just be defined to be the thing you pass to your handler... in which case for pynbody handlers it would be whatever pynbody expects.

What a mess!

Are you able to make a PR that implements this change?

mtremmel commented 5 years ago

give me some time but yes, I can come up with a fix. My goal will be to constrain the fix to the AHF handler.... I suppose this is the challenge of trying to be useful to everyone!

mtremmel commented 5 years ago

The way to think about it I suppose is that we are defining the AHF catalog ID as the order within the AHF_halos file. Other people will need to implement their input handlers to compensate for that I suppose. I don't think that's a very big ask IMO (especially given that this is how it normally is!)