trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.19k stars 559 forks source link

Epetra_Map -> create_VectorSpace() -> get_Epetra_Map() doesn't return the original map #3

Closed nschloe closed 3 years ago

nschloe commented 8 years ago

Given an Epetra_Map, one can convert it to a Thyra::VectorSpaceBase,

Teuchos::RCP<const Thyra::VectorSpaceBase<double>> space =
    Thyra::create_VectorSpace(mymap);

When translating the result of this back to an Epetra_Map,

Teuchos::RCP<const Epetra_Map> spacemap =
    Thyra::get_Epetra_Map(*space, comm);

it's not the same as the original map.

spacemap->SameAs(*mymap) // false

(From Sandia bug 6341.)

mhoemmen commented 8 years ago

@trilinos/thyra Could you explain the use case or test that's failing? Thanks!

tawiesn commented 8 years ago

@nschloe The Thyra::get_Epetra_Map routine is not exactly doing what you expect: It creates a new EpetraMap which contains newly generated GIDs based on the Thyra blocking information. The only thing that is guaranteed is, that the number of local elements is the same in both Epetra maps. The original Epetra map is usually stored and can be accessed using

Teuchos::RCP epetra_map = Teuchos::get_extra_data<RCP >(space,"epetra_map");

So, the name "get_Epetra_Map" is probably misleading. In case you are using Thyra for your projects, i recommend using Xpetra and the routines Xpetra::ThyraUtils<>::toXpetra and Xpetra::ThyraUtils<>::toThyra which allow you to transform different Epetra/Tpetra objects to Thyra objects and vice versa.

bartlettroscoe commented 8 years ago

That is a bad name. It should have been called create_Epetra_Map() or we should put in the infrastructure to get back the exact same Epetra_Map that was created. That is likely not a lot of work. I would be happy to do it if I could get some funding from SNL to do it.

bartlettroscoe commented 8 years ago

I guess that I should comment that in all the cases where I used the Thyra <==> Epetra adapters, I always kept the original Epetra_Map around so I never wanted to grab it out of the Thyra::VectorSpaceBase object. That was easy to do in my use cases so I never thought about this issue.

mhoemmen commented 8 years ago

This depends on Issue #142.

bartgol commented 6 years ago

I stumbled on this problem while trying to abstract the linear algebra interface in Albany. Is there a reason why one should not implement the Thyra-Epetra adapters in the same fashion as the Thyra-Tpetra ones? I like the behavior that ensures the map epetra->thyra->epetra is an identity. Besides the fact that creates no confusion, it also 1) avoid extra copies, 2) does not require additional inputs (e.g., Epetra_Map) when converting a (Mutli)Vector from Thyra to Epetra.

I am wondering if there is a good reason not to implement it (other than time to do it, but I may help with that).

mhoemmen commented 6 years ago

@bartlettroscoe and @rppawlo might know if there's a good reason why.

bartgol commented 6 years ago

Just for fun I started to implement the epetra interface to do an equivalent job of what the tpetra one does. I got the vector space and multi vector interfaces down (modulo testing), so I'm starting to believe it's doable. The only caveat is for subviews, which tpetra handles via RCP, while epetra uses raw pointers. But I think one can make it work via the extra data feature of rcps.

mhoemmen commented 5 years ago

@bartgol FYI I'm not sure how the question of Tpetra GPU vs. CPU data views with Tpetra gets handled in Thyra. I know @bartlettroscoe and @rppawlo worked on that a few years ago. I'm not a fan of how Tpetra does it, but interface changes are scary and not everybody agrees how we should do it.

bartgol commented 5 years ago

I'm not sure I fully understand your comment. Thyra always gets a host view, which is probably not optimal in some cases, but leaves the interfaces relatively simple (in both thyra and the application using thyra). As long as tpetra syncs the device and host views, correctness should be ensured.

mhoemmen commented 5 years ago

Thyra also has to use sync and modify correctly in order for this to work.

bartgol commented 5 years ago

Well, thyra uses the get1dViewNonconst method, which does that for you (now that it has been fixed)

mhoemmen commented 5 years ago

@bartgol Cool, that should be fine then :-D Thanks!

rppawlo commented 5 years ago

@bartlettroscoe and @rppawlo might know if there's a good reason why.

Not sure about the epetra design - was only involved in the Thyra ones. @bartgol - we would like to get the Epetra ones updated if you could help with this. For the Tpetra implementation, @atoth1 converted the Thyra Tpetra wrappers to use direct calls on Tpetra objects for GPUs (bypassing rtops). This works for all the calls NOX uses but the complete interface has not been changed over. Hope to finish this off in FY19.

bartgol commented 5 years ago

During the weekend I worked on a branch to update the epetra adapters, making them more like wrappers, like the tpetra ones. I have done all the implementation, but I still have to do the tests.

bartlettroscoe commented 5 years ago

I think having more straightforward Thyra/Epetra adapters like the Thyra/Tpetra adapters would be good to have at this point. When the Thyra/Epetra adapters were created 13+ years ago, there was no GPU programming and the approach employed by the Thyra/Epetra adopters would allow mixing and matching different concrete implementations other than Epetra. But did not really get exploited and with GPUs and other changes in the environments, a more inclusive solution would require going back to the drawing board.

My only concern with this effort would be breaking backward compatibility if we replace the existing Thyra/Epetra adapter implementation with another implementation. But we might provide these new adapters as a new implementation and deprecate the old one perhaps. But that might not be necessary if we get lucky.

bartgol commented 5 years ago

In the branch I'm preparing I am not removing nor deprecating the old interfaces. Luckily, the old epetra interface look like get_Epetra_x or create_y (x is an Epetra class type, y a Thyra class type). The new interfaces have the same naming convention as the Tpetra ones: getEpetraX and createThyraY. So there is no clash, and one can simply use the one he wants. We could indeed consider deprecation, but I would like to use the new interfaces for a bit in Albany first, before claiming they are good to go and we can deprecate the old ones.

Out of curiosity: do we know how many/which applications use these adapters? I know Albany is about to use them (so far it always used Tpetra as back-end, manually converting to Epetra from Tpetra, but we're changing this).

tawiesn commented 5 years ago

@bartgol Just a comment: Xpetra has wrappers to/from Thyra. It may (or may not) have been easier for you to wrap Eptra in Xpetra objects (all those wrappers exist) and use the Xpetra/Thyra wrappers. We used them for MueLu and drekar. There are also tests for those wrappers... I don't know your exact use case, but the general question is: why would you like to call "get_Epetra_Map" or "get_Tpetra_Map" to get access to the underlying Epetra or Tpetra map from a Thyra object. This way you introduce again Epetra/Tpetra specific code with code duplication etc...

bartgol commented 5 years ago

Well, I wasn't aware that xpetra had thyra interfaces. So there's that.

Besides that, what brought me to this issue is a refactoring going on in Albany. In particular, we're trying to abstract the linear algebra structures, using Thyra. There are multiple goals, but the main two are: using block thyra stuff (and Teko), and possibly allow non-petra linear algebra backends. Now, Albany has plenty of hard coded Tpetra/Epetra stuff (as of now), so introducing Xpetra was going to create major complications, at least that was my thought. Besides, users would have had to enable yet another trilinos package (though this is, admittedly, a minor drawback). Anyhow, the get*petraXYZ calls should be very limited in Albany (once this thyra refactor is done). In fact, I don't want to call get_Epetra_map nor getTpetraMap at all, if possible (though I think at some point one has to).

I feel like this conversation is getting bigger than what the title says. I wonder if this issue should be renamed or closed and reopened under a new name.

@bartlettroscoe I am not in the place of taking trilinos interface decisions (I'm not even part of the team). All I want is to have ThyraEpetra interfaces that wrap the Epetra stuff, rather than copying it. If you want to deprecate the old interface, I can do it. However, I was thinking of creating something like a 'refactor' folder, and put the new classes in there, with _refactor in the file name in case of file names clashes (e.g., Thyra_EpetraThyraWrappers.*pp). If the refactored files become trustworthy, one can then deprecate the old ones, and finally replace the old ones with the refactored ones down the road. I think this was the path followed by the Tpetra kokkos refactor stuff?

mhoemmen commented 5 years ago

@bartgol wrote:

If the refactored files become trustworthy, one can then deprecate the old ones, and finally replace the old ones with the refactored ones down the road. I think this was the path followed by the Tpetra kokkos refactor stuff?

Tpetra's refactor was more complicated than that. Tpetra maintained old and new versions of classes for a while. Users would select one or the other at compile time, based on Node type.

bartgol commented 5 years ago

Ok, I was wrong. =)

What if I introduce a cmake config option, which lets the user decide if they want the new adapters? The default would be, of course, OFF, so that it would be 100% backward compatible. If the new interfaces work, once can change the default. This would also allow to have two classes/files with the same name in the old/new interface, since only one would be active in a single build. Would that be acceptable?

mhoemmen commented 5 years ago

@bartgol Is the plan eventually to deprecate and remove the old adapters?

bartgol commented 5 years ago

Well, I don't know. That depends on whether thyra customers are currently using the old epetra interfaces, and how much do they rely on the fact that they can always create an epetra object, even if the input thyra object is not wrapping epetra data.

Ideally, I would say yes, remove them at some point. But I am in no position to make that call. Perhaps Thyra maintainers (or trilinos managers) should weigh in on that.

One thing that the new interfaces do not support is to create an EpetraThyra object from a given Thyra object. That's what the old interface does. However, I can add this feature, if desired. The interfaces I have now are along the line of getEpetraXYZ(RCP<Thyra::XYZ>& xyz), and the old behavior (as @bartlettroscoe pointed out) is more like a createEpetraXYZ(RCP<Thyra::XYZ>& xyz), since it copies into a brand new Epetra structure. The advantage of the old interface is that it allowed mixed backend linear algebra packages. I am not sure how much that feature is desired, but I can always add the 'create' interface alongside with the 'get' interface.

For what is worth, Albany is not interested (as of today) in a mixed backend linear algebra support. We are picturing a scenario where we commit to one backend at the beginning of the simulation, and stick with that.

mhoemmen commented 5 years ago

The advantage of the old interface is that it allowed mixed backend linear algebra packages.

Did anybody ever test that? It could be a nice feature to be able to call, say, some Aztec stuff with a (compatible) Tpetra matrix.

bartlettroscoe commented 5 years ago

Well, I don't know. That depends on whether thyra customers are currently using the old epetra interfaces, and how much do they rely on the fact that they can always create an epetra object, even if the input thyra object is not wrapping epetra data.

My guess is that quite a few customers are either directly or indirectly using the existing Thyra/Epetra adapters. But if we update all of Trilinos (including the guts of the Stratimikos adapters) to use these new Thyra/Epetra adapters, then I think upgrading would not be that hard for users (but this might break backward compatibility).

My suggestion would be to just add the new getEpetraXYZ() and createXYZ() functions in the main Thyra namespace, refactor all of Trilinos to use the new getEpetraXYZ() functions and then deprecate the old get_Epetra_XYZ() and create_XYZ() functions documented here.

Did anybody ever test that? It could be a nice feature to be able to call, say, some Aztec stuff with a (compatible) Tpetra matrix.

It is possible and I would not be surprised there are cases of code in Trilinos and user code of using a Thyra::DefaultSpmdMultiVector object not created from a wrapped Epetra_MultiVector object created with Thyra::create_MultiVector() to get an Epetra_MultiVector object using Thyra::get_Epetra_MultiMultiVector(). That is what the old Thyra/Epetra adapters allowed you to do, for example. It is that issue that I suspect might make this refactoring more difficult to get rid of the old Thyra/Epetra adapters.

Remembering that and thinking about this more, it might be easier to just get the Thyra::create_VectorSpace() function to remember the input Epetra_Map object and then have the Thyra::get_Epetra_Map() function return that exact Epetra_Map object. Actually, from looking at:

it looks like the input Epetra_Map is already being remembered. Therefore, we should just have to update the function Thyra::get_Epetra_Map() to lookup that Epetra_Map object and return it if it finds it. That might just be a couple of lines to add and a few unit tests.

bartgol commented 5 years ago

The big problem with the current interfaces, in my opinion, is that Epetra_Map -> VectorSpace -> Epetra_Map is not an identity (as the title of this issue says). If the input map is linearly distributed and not overlapped, then yes, it is, but that's not going to be always the case.

I am almost done with the refactored interfaces tests in my fork. I will put up a PR, so at least you guys can see what it looks like. My implementation allows to choose either or: new interfaces or old ones, with old ones being the default. I guess I could slowly propagate this check (old vs new) throughout the library, if that's desirable.

bartlettroscoe commented 5 years ago

@bartgol said:

If the input map is linearly distributed and not overlapped, then yes, it is, but that's not going to be always the case.

But I think to pass Epetra objects to solvers, the Epetra_Map has to be non-overlapped, right? If that is the case, then I think we can provide petra_Map -> VectorSpace -> Epetra_Map as the identity.

Is it the case that these wrapped Epetra objects will not be directly passed to solvers like AztecOO that require non-overlapped maps for the Epetra objects? Are these Epetra objects only used during residual and Jacobian fills?

My implementation allows to choose either or: new interfaces or old ones, with old ones being the default. I guess I could slowly propagate this check (old vs new) throughout the library, if that's desirable.

That sounds like a good strategy. Then we could just deprecate the old approach (hopefully with compile-time warnings) so that we can get rid of them at some point. I look forward to seeing the PR.

NOTE: I think we can add unit tests by copying and pasting the unit tests for the Thyra/Tpetra adapters and make a few changes. Those are pretty comprehensive. (The Thyra/Epetra adapters were written prior to 2007 before I learned about the importance of unit testing and how to do it effectively.)

bartgol commented 5 years ago

NOTE: I think we can add unit tests by copying and pasting the unit tests for the Thyra/Tpetra adapters and make a few changes. Those are pretty comprehensive. (The Thyra/Epetra adapters were written prior to 2007 before I learned about the importance of unit testing and how to do it effectively.)

That's exactly what I did. Some Tpetra tests do not make sense for Epetra, but most of them do, so I just had to adjust them. I'm still fighting with some 32-vs-64 bits issues in Epetra, but when I'm done with that I will put the PR up (most likely today).

About the map->vs->map case: in order to construct an Epetra_Map from a vs, the current interfaces build the global elements list as [offset, offset+1,...,offset+localDim-1], where localDim is the numbe of elements on node, and offset is the scan sum of localDim across the processes. This is not correct, as it loses the original GID information. This is particularly a no-no for Albany, since we create maps of dofs on sides of the mesh, hence with GIDs that are non-contiguous, don't necessarily start from 0, and may be distributed in a funky way. Even if the GIDs, globally, span [0,N-1], the final map (after map->vs->map conversions) will always be contiguous and linear, which could be different from the input one.

bartlettroscoe commented 5 years ago

@bartgol wrote:

About the map->vs->map case: in order to construct an Epetra_Map from a vs, the current interfaces build the global elements list as [offset, offset+1,...,offset+localDim-1], where localDim is the numbe of elements on node, and offset is the scan sum of localDim across the processes. This is not correct, as it loses the original GID information.

From that ANA (Abstract Numerical Algorithm) perspective, the Thyra interface does not care about GID vs. LID issues. All that matters is how many globally unique vector elements (or MultiVector rows) are on each process. For Thyra RTOps to work, they have to only be feed unique elements, there can't be any duplicates. I think the Epetra implemenations of MutiVector operations themselves have the same requirement. That is, I think that in order to compute the correct mathematical norm of the vector, you map and Vector or MultiVector has to have all globally unique elements (i.e. no ghosted elements). It does not matter how the GIDs are allocated to LIDs. If the current implementation of Thyra::create_VectorSpace() messes that up, then we need to fix that. We may need to put in some debug-mode checks and unit tests to ensure that is the case. Or, we need to warn users that the mathematical RTOp functions will be meaningless if they pass in a map that has duplicated (i.e. ghosted) elements.

But the Epetra_Operator Range Map and Domain Maps are always non-ghosted, are they not?

bartgol commented 5 years ago

Yes, they need to be unique maps for those operations to make sense (or else you compute something different from the mathematical norm). And operators should store non-ghosted maps for domain and range (I am not sure if there are checks for that, or what would happen if they are not). And I like that Thyra works at the math level, hiding (at least in the interface folder) the CS details about concrete implementations.

My concerns are:

I am positive that in those scenarios the conversion of maps would be buggy. I am not 100% sure if that would be true for (multi)vectors, since one passes to the converter also the desired output map (which must obviously have the correct local dimension). It may be that those conversion routines are fine. Though they force one to keep the underlying Epetra_Map stored somewhere, which may be inconvenient.

bartlettroscoe commented 5 years ago

I think it okay to allow Thyra::VectorSpaceBase, Thyra::VectorSpace and Thyra::MultiVectorBase objects to be constructed from maps with duplicated/ghosted elements, but I think we would need to put in debug-mode checking to detect when these types of maps are vector/multi-vector objects are being represented and then throw debug-mode exceptions if someone tries to run some types of mathematical operations on these vector/multi-vector objects. Such Thyra objects would be really only be useful to pass around concrete implementations of objects in a run-time polymorphic way to be used in APP-LAL operations (which is the goal for Albany I would guess). (BTW: All of these"ANA", "APP" and "LAL" references are described in the documented Thyra Linear Operators and Vectors linked to from the Thyra Doxygen Documentation). Allowing for the creation of these Thyra objects that don't allow mathematical operations expands the scope of Thyra interfaces a bit but I don't see the harm in doing so if it makes life for APPs like Albany easier.

But note that some transformation RTOp operations that loop over all elements such as AXPY (or any linear combination) are still valid for duplicated/ghosted elements. It is just reduction operations like norms and inner products that would be wrong with a simple implementation that loops over all elements (and therefore would get corrupted by duplicated elements). That is, all of the transformation operations for Thyra::MultiVectorBase shown here like assign(), scale(), update() and linear_combination() work correctly for duplicated/hosted elements. It is just reduction operations like dots(), norms_1(), norms_2(), and norms_inf() that would be incorrect. If we really wanted to, we could make these reduction operations work correctly when there are ghosted element but that would be complex and slow in the most general case where ghosted elements are intermixed with unique elements.

bartgol commented 5 years ago

I like what you said, but I wonder if it should be Thyra responsibility to check for ghosted maps during reduction. On one hand, yes, it should, because Thyra is in charge of keeping the algorithm math-sound. On the other hand, the *petra objects never check if the map is ghosted before a collective calculation. So I'm debated. But on the bright side, it is something that should be easy to add in a second stage (I don't want to blow up this PR too much; it's already taken a lot of my time).

Ps: thanks for the thyra documentation links! Somehow I never scrolled all the way down the doxygen page, to find the links to the guides... :/

bartlettroscoe commented 5 years ago

Yes, we can add extra checking other behavior in future PRs. The bottom line is that as long a unique non-ghosted maps are passed in, everything should be fine (which should cover all of the existing use cases). And for the Albany use case, as long as you don't pass these ghosted maps and associated Epetra_MultiVector objects to math algorithms, you should be fine.

mhoemmen commented 5 years ago

If you give AztecOO or ML a matrix with an overlapping row Map, it will certainly give you the wrong answer, crash, or otherwise misbehave. They actually have even more restrictive requirements on the column Map.

bartlettroscoe commented 5 years ago

@bartgol, while some issues with the PR #3222 are being worked out, would it be helpful to fix up the existing Thyra::create_VectorSpace() and Thyra::get_Epetra_Map() functions to return the same Epetra_Map when possible?

bartgol commented 5 years ago

I guess that's possible. Since the created vector space has the RCP to the Epetra_Map attached as extra data, if we assume the user is not going to mess with that extra data, we can check if that data exist, and if it does, return it directly during get_Epetra_Map. That would be a single mod to a single function.

Perhaps we can do this intermediate work. E.g., for the vector routines, right now we have

RCP<const Epetra_Vector>
get_Epetra_Vector(
  const Epetra_Map &map,
  const RCP<const VectorBase<double> > &v
  ); 

I want to leave this interface ass it is, but also add

RCP<const Epetra_Vector>
getEpetraVector(
  const RCP<const VectorBase<double> > &v
  const RCP<const Epetra_Map> map = Teuchos::null
  );  
RCP<const Epetra_Vector>
getConstEpetraVector(
  const RCP<const VectorBase<double> > &v
  const RCP<const Epetra_Map> map = Teuchos::null
  );  

If the input VectorBase was created from an Epetra_Vector, these routines do not need to receive a map input. If they don't find the extra data, they can fall back on the existing implementation. As for the create_Vector methods, I'd like to leave the second argument (the vector space) optional. If null, create a VS on the fly.

I can set up a quick PR for this _soon_ish.

Edit: Ah, I see that get_Epetra_Vector already does the check on the extra data. So only the map doesn't. Still, I want to have a signature where one does not pass the Epetra_Map to get the (Multi)Vector, in case the input RCP does indeed store the extra data.

bartgol commented 5 years ago

I think this issue can be closed now, via #3225

bartlettroscoe commented 5 years ago

Commit b738ce6f364f8b883d89b3e3b100506965c43000 does not look to have anything to do with this issue so I have reopened it.

github-actions[bot] commented 3 years ago

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.

github-actions[bot] commented 3 years ago

This issue was closed due to inactivity for 395 days.