Add two integers to Elem

friedmud commented 5 years ago

We have a need for two unsigned ints to get added to Elem. I propose: user_id1 and user_id2. This will be controlled with --enable-extra-elem-ids at configure time (and configured to false by default). These IDs will be user settable - but otherwise will act just like subdomain_id and unique_id.

This is something that we've had to engineer around for a long time in one of our applications (Rattlesnake) and there's just no reason for it.

Eventually maybe https://github.com/libMesh/libmesh/issues/1439 will obviate the need for this (maybe)... but there's no reason to wait for that.

YaqiWang commented 5 years ago

I can testify how useful this will be to Rattlesnake. We need extra IDs to manage the nuclear data for large simulations for nuclear reactors. Currently we have to contaminate the code with the engineering solutions all over the place that are not only difficult to understand but also difficult to maintain. If you think 2 is too much like a hack, we can have --extra-elem-ids=n, where n can be a config time unsigned integer default to zero.

jwpeterson commented 5 years ago

Yeah I guess I'd vote for

int _extra_ids[N_EXTRA_IDS];

or

std::array<int, N_EXTRA_IDS> _extra_ids;

where N_EXTRA_IDS is a compile-time constant. I'd probably also vote for dof_id_type or a new configurable type instead of int but don't feel too strongly about it.

roystgnr commented 5 years ago

If you think 2 is too much like a hack, we can have --extra-elem-ids=n, where n can be a config time unsigned integer default to zero.

I like this.

My big concern with "hacks" here is that I'd like it to be extensible. Sort of how when we add_variable() we get a variable index out, and doing that allows composeable multiphysics. Would it be reasonable for you guys to call some rattlesnake_index1 = system.add_extra_integer() (and then again with rattlesnake_index2) when building your Systems, then call elem.extra_integer(rattlesnake_index1) when retrieving it?

Currently we have to contaminate the code with the engineering solutions all over the place that are not only difficult to understand but also difficult to maintain.

Could you elaborate? The solutions I'd have tried would have been:

1) Assign two new MONOMIAL variables to your auxilliary system

or

2) Use unique_id to index into an unordered_map

Which would be less efficient but not obviously difficult.

I'd probably also vote for dof_id_type or a new configurable type instead of int but don't feel too strongly about it.

Does anyone else feel strongly about it? If we were to use dof_id_type then it might be easy to slip this into DofObject::_idx_buf, in which case n could be a runtime assignable count but the n=0 case would still be zero overhead! I'd strongly prefer runtime over configure time n, for ease of test coverage and for integration with future work.

permcody commented 5 years ago

Derek might be MIA for the next week or two as he really concentrates on finishing his dissertation so I'll make an attempt at answering these questions.

Would it be reasonable for you guys to call some rattlesnake_index1 = system.add_extra_integer() (and then again with rattlesnake_index2) when building your Systems, then call elem.extra_integer(rattlesnake_index1) when retrieving it?

Yes! In fact this is precisely what Derek proposed verbally right after he posted this ticket. We can't have developers using hard-coded integers to access these data fields if we want them to be extensible among multiple developers so a registration system that returns the index value would be ideal.

Could you elaborate? The solutions I'd have tried would have been: Assign two new MONOMIAL variables to your auxilliary system, or Use unique_id to index into an unordered_map Which would be less efficient but not obviously difficult.

We are in fact doing a combination of these things now in regards to Rattlesnake. The issue is that the usage is somewhat clunky. Part of the issue comes from bootstrapping these IDs into the mesh as it's read in or constructed, but new dedicated field may or may not make that part any easier. Also, keeping it up to date is also clunky. Our Rattlesnake developers have worked around many of these issues, but have long had the desire to make everything more automatic by just having local fields available right on the element. @YaqiWang can elaborate more on what they are doing currently and why it falls short.

Does anyone else feel strongly about it? If we were to use dof_id_type then it might be easy to slip this into DofObject::_idx_buf, in which case n could be a runtime assignable count but the n=0 case would still be zero overhead! I'd strongly prefer runtime over configure time n, for ease of test coverage and for integration with future work.

To the best of my knowledge, no. For all of our immediate needs, a short int would suffice. We plan to use these things to lookup material properties and represent other ids that won't exceed the 2^16 threshold in the foreseeable future. We'd be happy with any type that makes the implementation more reasonable.

YaqiWang commented 5 years ago

My big concern with "hacks" here is that I'd like it to be extensible. Sort of how when we add_variable() we get a variable index out, and doing that allows composeable multiphysics. Would it be reasonable for you guys to call some rattlesnake_index1 = system.add_extra_integer() (and then again with rattlesnake_index2) when building your Systems, then call elem.extra_integer(rattlesnake_index1) when retrieving it?

This will require we manage the data dynamically. We will need to store an extra pointer and the IDs will be stored in heap possibly separated from where Elem is at. I admit this gives us the flexibility and avoids potential conflict among developers who may use those IDs differently. Because we are using distributed mesh for large problems, the extra memory cost is tolerable. I think this is a good idea. Looks like @friedmud and @permcody think as well.

Assign two new MONOMIAL variables to your auxilliary system

or

Use unique_id to index into an unordered_map

Which would be less efficient but not obviously difficult.

We actually tried both. Because the IDs are really belonging to the mesh (meaning that they need to be constructed during mesh generation), the first solution is not ideal. Finding a place to store the IDs for the second is not strait-forward, we also have to consider restart/recover carefully.

I'd probably also vote for dof_id_type or a new configurable type instead of int but don't feel too strongly about it.

I am not too concerned with the type. extra_id_type in whatever type works for me.

Does anyone else feel strongly about it? If we were to use dof_id_type then it might be easy to slip this into DofObject::_idx_buf, in which case n could be a runtime assignable count but the n=0 case would still be zero overhead! I'd strongly prefer runtime over configure time n, for ease of test coverage and for integration with future work. I prefer the add_extra_integer approach. It is desired that these IDs are in Elem.

YaqiWang commented 5 years ago

One small thing, @roystgnr when you type system.add_extra_integer(), are you having mesh.add_extra_integer() in mind?

roystgnr commented 5 years ago

One small thing, @roystgnr when you type system.add_extra_integer(), are you having mesh.add_extra_integer() in mind?

Based on your previous comment, this is a big thing, not a small thing, isn't it? I had been thinking system.add_extra_integer(), but that leave you stuck with the same problem as before if you have ids that need to be constructed during mesh generation but you don't yet have any System attached to that mesh, right? So we can do a mesh.add_extra_integer() API instead.

This will require we manage the data dynamically. We will need to store an extra pointer

Not if we append that data to the DofObject::_idx_buf!

and the IDs will be stored in heap possibly separated from where Elem is at.

This is true, but IIRC not too bad performance-wise, compared to almost any other form of lookup. That buffer will end up in cache anyway if you're anywhere near any code that plays with dof indices on the same element.

The only thing I don't like about the _idx_buf trick is that the internals would be ugly if in the future anyone needed to store data larger than sizeof(dof_id_type), in particular pointers as per #1439... but if we consider the API carefully now then later we can handle data munging and alignment under the hood, or in the medium term we can require such use cases configure with 8 byte dof_id_type.

Because we are using distributed mesh for large problems

Oh, that reminds me to ask - you're planning on storing integers that have a consistent definition between processors, so we can handle redistribution by just sending the raw integers, right? If we need to worry about reinterpretation (e.g. for indices into local-to-one-rank arrays) then that's another level of problem. It's a level of problem we'll have to worry about eventually anyway when we consider pointers, but if you want a solution ASAP then I'd like to postpone that complication if possible.

we also have to consider restart/recover carefully.

Yikes, yes we do. Will CheckpointIO support be sufficient in the short run, or do we need I/O for this data via Nemesis, Exodus, XDR, and/or other formats too?

YaqiWang commented 5 years ago

Oh, that reminds me to ask - you're planning on storing integers that have a consistent definition between processors, so we can handle redistribution by just sending the raw integers, right?

Yes. They are static and are not going to be reordered.

Yikes, yes we do. Will CheckpointIO support be sufficient in the short run, or do we need I/O for this data via Nemesis, Exodus, XDR, and/or other formats too?

Yes, CheckpointIO is enough. If desired, users can use aux kernels to pull the IDs out and store them into the other output formats.

YaqiWang commented 5 years ago

Looks like we reach a consensus on this. Will somebody volunteer to put this in?

roystgnr commented 5 years ago

I'll take a crack at it shortly.

YaqiWang commented 5 years ago

Thanks @roystgnr. Really appreciated. This IDs can be assigned by users with our subdomain mesh modifiers. We can simply in the input ask for a new parameter like id_name and all existing mesh modifiers can be reused.

YaqiWang commented 5 years ago

Thanks @roystgnr. Now I can work on cleaning up something in Rattlesnake.

roystgnr commented 5 years ago

Now I can work on cleaning up something in Rattlesnake.

Well, you can get started. I shouldn't have closed this issue, though; we have runtime support now but not yet I/O support. I'll get it into CheckpointIO as soon as I can.

friedmud commented 5 years ago

@roystgnr sorry I was AFK. This looks like a great first cut at the capability. Thanks so much for adding it!

roystgnr commented 5 years ago

Don't worry; either baby or PhD dissertation would naturally trump bit fiddling, much less both together!

If you're catching up on stuff now, then I think the only issues where I was waiting on your input are #1872 and #1888, and both of those are just optimization attempts, not bugfixes or anything, so not urgent.

libMesh / libmesh

Add two integers to Elem #2095