mercury-hpc / mercury

Mercury is a C library for implementing RPC, optimized for HPC.
http://www.mcs.anl.gov/projects/mercury/
BSD 3-Clause "New" or "Revised" License
172 stars 62 forks source link

Need an API to tell how much data could be transferred in a single bulk_transfer() #648

Open bozhang-hpc opened 1 year ago

bozhang-hpc commented 1 year ago

Is your feature request related to a problem? Please describe. I got a HG_PROTOCOL_ERROR when the data chunk is not small enough.

# [1017487.534554] mercury->op: [warning] /work2/07555/bz186/frontera/apps/mercury/src/na/na_ofi.c:5103
 # na_ofi_cq_readerr(): fi_cq_readerr() got err: 5 (Input/output error), prov_errno: 1 (local length error)
# [1019973.528566] mercury->rma: [debug] /work2/07555/bz186/frontera/apps/mercury/src/na/na_ofi.c:4965
 # na_ofi_rma_post(): Posting RMA op (fi_writemsg, context=0x3f3c810), iov_count=1, desc[0]=0x4887770, msg_iov[0].iov_base=0x2b93d0e48010, msg_iov[0].iov_len=1679616000, addr=9, rma_iov_count=1, rma_iov[0].addr=47744078381072, rma_iov[0].len=1679616000, rma_iov[0].key=20992, d

Describe the solution you'd like I would like to know what is the max size that can be handled in a single bulk_transfer(). Then I can cut the data and call the bulk_transfer multiple times.

Describe alternatives you've considered Or maybe Mercury can handle this internally?

Additional context

soumagne commented 6 months ago

we might be able to do that once libfabric gives us the ability to query the maximum size of RMA messages.