ut-osa / assise

GNU General Public License v2.0
57 stars 30 forks source link

Hitting assertion failure during replication #23

Closed rohankadekodi closed 2 years ago

rohankadekodi commented 2 years ago

Hello,

I have configured 32GB of emulated NVM on my machine using the steps mentioned in the repository for a 2-node cluster using RDMA.I have set the dev sizes to 8GB NVM using utils/change_dev_size.py 8 0 0

Then I start the cluster, and try to run the example mentioned in the repository: ./tests/run.sh iotest sw 2G 4K 1

However, libfs fails with the assertion (most of the times):

Assertion failed: src/distributed/replication.c, start_rsync_session(), 995 at 'peer->remote_start <= peer->start_digest'

Could I get help with this issue? I am happy to provide access to my cluster if that helps solve the issue.

simpeter commented 2 years ago

Thanks, Rohan!

Waleed and Jongyul, do either of you know what this might be? It looks like an out-of-sync peer or a wraparound error to me.

-- Simon

On Sat, Jan 15, 2022 at 2:27 AM Rohan Kadekodi @.***> wrote:

Hello,

I have configured 32GB of emulated NVM on my machine using the steps mentioned in the repository for a 2-node cluster using RDMA.I have set the dev sizes to 8GB NVM using utils/change_dev_size.py 8 0 0

Then I start the cluster, and try to run the example mentioned in the repository: ./tests/run.sh iotest sw 2G 4K 1

However, libfs fails with the assertion (most of the times):

Assertion failed: src/distributed/replication.c, start_rsync_session(), 995 at 'peer->remote_start <= peer->start_digest'

Could I get help with this issue? I am happy to provide access to my cluster if that helps solve the issue.

— Reply to this email directly, view it on GitHub https://github.com/ut-osa/assise/issues/23, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHQBMSP2XTXGBU6V7CVZ3LUWEVUNANCNFSM5MAXAUCA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

wreda commented 2 years ago

This does indeed look like a wraparound error, but I'm unable to reproduce it on my end.

@rohankadekodi: It might be easier if I run iotest directly on your cluster. Can you share access instructions via email? I have cycles this Wednesday.

wreda commented 2 years ago

This should now be fixed!