pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
535 stars 280 forks source link

PAMID: Memory leak in MPI RMA #1919

Closed mpichbot closed 7 years ago

mpichbot commented 7 years ago

Originally by suhuang on 2013-08-02 12:11:52 -0500


The problem was found and fixed by Sameer Kumar:

commit 91102a978077a01a81ad6285712e6cf26c0cdc06

The put request is only freed when all previous puts have completed. It was probably added to free the request after the last put in a non-contiguous set of puts completes. However it will lead to a memory leak when several MPI_Put calls are issued without a fence, unlock or win_complete between them. Then only the last put request is freed.

The fix avoids use of sync state to check when state is freed. Instead it uses counters in the request.

void

26 MPIDI_Win_DoneCB(pami_context_t context,

27 void * cookie,

28 pami_result_t result)

29 {

30 MPIDI_Win_request _req = (MPIDI_Winrequest)cookie;

31 ++req->win->mpid.sync.complete;

32 33 if ((req->buffer_free) && (req->type == MPIDI_WIN_REQUEST_GET))

34 {

35 ++req->origin.completed;

36 if (req->origin.completed == req->target.dt.num_contig)

37 {

38 int mpi_errno;

39 mpi_errno = MPIR_Localcopy(req->buffer,

40 req->origin.dt.size,

41 MPI_CHAR,

42 req->origin.addr,

43 req->origin.count,

44 req->origin.datatype);

45 MPID_assert(mpi_errno == MPI_SUCCESS);

46 MPID_Datatype_release(req->origin.dt.pointer);

47 MPIU_Free(req->buffer);

48 req->buffer_free = 0;

49 }

50 }

51

52 if (req->win->mpid.sync.total == req->win->mpid.sync.complete)

53 {

54 if (req->buffer_free)

55 MPIU_Free(req->buffer);

56 if (req->accum_headers)

57 MPIU_Free(req->accum_headers);

58 MPIU_Free(req);

59 }

60 MPIDI_Progress_signal();

61 }

mpichbot commented 7 years ago

Originally by suhuang on 2013-08-02 12:12:35 -0500


Attachment added: 0001-Trac-665.-Memory-leak-fix-for-MPI-RMA.patch (6.8 KiB)

mpichbot commented 7 years ago

Originally by balaji on 2013-08-06 12:26:48 -0500


Is there a branch on mpich-ibm for this, or should I commit the attachment on this ticket?

mpichbot commented 7 years ago

Originally by blocksom on 2013-08-06 12:34:31 -0500


No branch for this one, just the attached patch which has been reviewed internally at ibm and signed-off.

mpichbot commented 7 years ago

Originally by balaji on 2013-08-06 12:38:52 -0500


Pushed to mpich/master in [7ddfa17b].

FWIW, the attachment didn't have a signoff. Also, it had a bunch of random white-space changes, trailing white-space on a bunch of lines, etc. I've pushed it in, since it only affected the pamid device.