Closed melrom closed 5 years ago
I agree that error handling is something we need to address. Currently many errors leave the system in an undefined state. This is not good, and we should handle errors more gracefully.
I do not think we should exit, for the reasons you mentioned and also because the application code may have fall back options it can use even in the presence of a catastrophic dataspaces error.
A function to dump the warp core makes sense to me, and I think we need to decide what our claims are going to be about the state of the system once we exit. Are we going to shift to a claim that the state of the system is defined on certain exit conditions? All errors? Some errors? What level of guarantee we are giving about the state of the system after a failure has occurred will have implications for what we do in the exit routine. I think this is actually a significant piece of the semantics, as it affects what our users are confident doing in their applications own error handling.
I suggest the name dspaces_panic()
. I think it conveys what is happening.
I agree Philip. It is just as important to clearly document what this is.
Manish
On Jan 26, 2018, at 10:14 PM, Philip Davis notifications@github.com<mailto:notifications@github.com> wrote:
I agree that error handling is something we need to address. Currently many errors leave the system in an undefined state. This is not good, and we should handle errors more gracefully.
I do not think we should exit, for the reasons you mentioned and also because the application code may have fall back options it can use even in the presence of a catastrophic dataspaces error.
A function to dump the warp core makes sense to me, and I think we need to decide what our claims are going to be about the state of the system once we exit. Are we going to shift to a claim that the state of the system is defined on certain exit conditions? All errors? Some errors? What level of guarantee we are giving about the state of the system after a failure has occurred will have implications for what we do in the exit routine. I think this is actually a significant piece of the semantics, as it affects what our users are confident doing in their applications own error handling.
I suggest the name dspaces_panic(). I think it conveys what is happening.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fphilip-davis%2Fdataspaces%2Fissues%2F34%23issuecomment-360955731&data=02%7C01%7Cparashar%40ored.rutgers.edu%7Ccbd597fd8ac443adedc108d565341ed5%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636526196903840029&sdata=ngVN7ukjyGkwN7NeO%2FNz2QnkbasPjzkFKiNQ74FW2zA%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIvVmdyG22fmdUl9iolVUZH33ZCYUpQ4ks5tOpSkgaJpZM4Rqqda&data=02%7C01%7Cparashar%40ored.rutgers.edu%7Ccbd597fd8ac443adedc108d565341ed5%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636526196903840029&sdata=63lIshT1NjQB7e%2Fn2AIBUdBYUTwW%2F%2BhR8McmZg2D2ls%3D&reserved=0.
In
common_dataspaces.c
, I would like to propose fail & exit for the following errors:They are all catastrophic errors that DataSpaces cannot recover from. My proposition is to do the following;
The only reason that I can imagine that we do not cause DataSpaces to fail in the event of these types of error is so that a simulation code that is coupled with DataSpaces would continue to run anyways even though the DataSpaces piece would not be working. Thus, it might generate some relevant file information even though whatever the sim was planning on doing with intermediate data would not work.
Either way, I think users would really rather have DataSpaces fail gracefully here and fail out their job rather than have everything run and not work. Making this a ticket because I was wondering if anyone has a different opinion for why we should continue running in the event of these catastrophic errors.