underworldcode / underworld3

https://underworldcode.github.io/underworld3/
Other
21 stars 10 forks source link

Unable to update passive swarm variables #166

Open bknight1 opened 8 months ago

bknight1 commented 8 months ago

UW is producing an index error when running on multiple processors:

Traceback (most recent call last):
  File "/Users/benknight/Documents/Research/Modelling/UW-models/UW3-dev/passive_swarm_issue.py", line 64, in <module>
    test_ps.data[:,0] = 1.
  File "/Users/benknight/Documents/Research/GitHub/underworld3/src/underworld3/swarm.py", line 1513, in __exit__
    var._update()
  File "/Users/benknight/Documents/Research/GitHub/underworld3/src/underworld3/swarm.py", line 278, in _update
    self._rbf_to_meshVar(self._meshVar)
  File "/Users/benknight/Documents/Research/GitHub/underworld3/src/underworld3/swarm.py", line 302, in _rbf_to_meshVar
    Values = self.rbf_interpolate(new_coords, verbose=verbose, nnn=nnn)
  File "/Users/benknight/Documents/Research/GitHub/underworld3/src/underworld3/swarm.py", line 386, in rbf_interpolate
    values = kdt.rbf_interpolator_local(new_coords, D, nnn, verbose)
  File "src/underworld3/kdtree.pyx", line 231, in underworld3.kdtree.KDTree.rbf_interpolator_local
IndexError: index 4607182418800017421 is out of bounds for axis 0 with size 2
[2]PETSC ERROR: ------------------------------------------------------------------------
[2]PETSC ERROR: Caught signal number 10 BUS: Bus Error, possibly illegal memory access
[2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[2]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[2]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[2]PETSC ERROR: to get more information on the crash.
[2]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.

I've attached a basic script to replicate the issue.

Cheers passive_swarm_issue.txt

lmoresi commented 8 months ago

I checked in a sort-of fix (setting nodal-point value to zero when no particles are found). I'm assuming no-particles-found is the error condition, but I haven't checked if it is actually a small number that is required to build the kd-tree consistently.

We did not have a check for this case: Sparse swarm which does not have particles in every process but still trying to build a proxy mesh variable. In general, this is not really a well-defined situation. We should probably have some value that can be set as the default value where no particles are nearby. Alternatively, refuse to build a proxy.

I'll leave the issue open for discussion of a proper fix.

gthyagi commented 8 months ago

I was doing something similar to this in #142. I am closing #142. We can track this one instead.

bknight1 commented 8 months ago

@gthyagi slightly different issue as the fixes Louis added won't stop the error you posted from occurring

gthyagi commented 8 months ago

@bknight1, I agree. I referred another issue here because these issues fall under a broader category known as "the scenario where a processor contains zero particles." The existing built-in particle functions consistently generate errors when executed in parallel under this circumstance.

For users transitioning from uw2, tasks involving particles may appear straightforward. However, in uw3, all such operations become nontrivial.

lmoresi commented 8 months ago

I'm not sure what we can do about this - underworld2 would return indices of particles added to the swarm so you could tell that there had been a migration but you could not always be sure you would still have particles on any given process.

We should think about how this is to be done to minimise the confusion in parallel.

I agree that this becomes complicated quite quickly and it always seems to be difficult but I thought this was the case in uw2 as well.

L

bknight1 commented 8 months ago

@bknight1, I agree. I referred another issue here because these issues fall under a broader category known as "the scenario where a processor contains zero particles." The existing built-in particle functions consistently generate errors when executed in parallel under this circumstance.

For users transitioning from uw2, tasks involving particles may appear straightforward. However, in uw3, all such operations become nontrivial.

I'm not sure I agree, what you were trying to do would still produce an error in UW2 too

knepley commented 8 months ago

@lmoresi We can number the particles if that is something Underworld needs. I have had at least one other request. It is the same as adding the "cellid" field, but it would be "particleid".

bknight1 commented 8 months ago

@knepley I think particle numbering would be useful for us

knepley commented 8 months ago

It is already doable. Add an int field and set it to an id when you create the particles.

lmoresi commented 8 months ago

We do need to have the ID be unique though and so we at least need to know the high water mark for the number of particles through time and always start from there. It causes problems if we re-use particle id values.

Sent from my iPhone

On 18 Mar 2024, at 22:41, Matthew Knepley @.***> wrote:



It is already doable. Add an int field and set it to an id when you create the particles.

— Reply to this email directly, view it on GitHubhttps://github.com/underworldcode/underworld3/issues/166#issuecomment-2003686726, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADABPIZYHEO6QX7ZLGAZM23YY3HEVAVCNFSM6AAAAABEGWHAM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBTGY4DMNZSGY. You are receiving this because you were mentioned.Message ID: @.***>

knepley commented 8 months ago

On Mon, Mar 18, 2024 at 5:56 PM Louis Moresi @.***> wrote:

We do need to have the ID be unique though and so we at least need to know the high water mark for the number of particles through time and always start from there. It causes problems if we re-use particle id values.

Okay, in this case I would make an API for a global counter, something like

CounterGetRange(counter, numIds, &idStart);

which is collective, and reserves [idStart, idStart + numIds) on that process. This turns adding particles into a collective operation, but it seems like it logically already was.

Matt

Sent from my iPhone

On 18 Mar 2024, at 22:41, Matthew Knepley @.***> wrote:



It is already doable. Add an int field and set it to an id when you create the particles.

— Reply to this email directly, view it on GitHub< https://github.com/underworldcode/underworld3/issues/166#issuecomment-2003686726>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/ADABPIZYHEO6QX7ZLGAZM23YY3HEVAVCNFSM6AAAAABEGWHAM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBTGY4DMNZSGY>.

You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/underworldcode/underworld3/issues/166#issuecomment-2005114913, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEORCLZETKM7W2DW5W2263YY5PIPAVCNFSM6AAAAABEGWHAM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBVGEYTIOJRGM . You are receiving this because you were mentioned.Message ID: @.***>

-- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ http://www.cse.buffalo.edu/~knepley/

lmoresi commented 7 months ago

Agreed - we should do this ourselves while we wait for this to propagate into PETSc itself. @julesghub - feel free to assign this to somebody other than me.