nikhilgupta10 / GridLAB-D

Other
1 stars 0 forks source link

#622 Issue with g_assert, #2365

Closed nikhilgupta10 closed 8 years ago

nikhilgupta10 commented 8 years ago

Email sent separately about this issue. During PowerWorld testing, Forrest encountered an issue with either a read lock timeout, or an unhandled exception while trying to assert a particular value. Matt indicated this appears to be in the g_assert function and Dave is the best person to debug the macro-related calls (set-target was mentioned).

The attached file shows the two errors if an assert or a double_assert is utilized. It requires PowerWorld to run properly, so it may not be the best example.

,

nikhilgupta10 commented 8 years ago

nikhilgupta10 imported these comments from Sourceforge: The user dchassin does not exist anymore. Therefore assigning this to afisher1. "ftuffner": * attachment _test_pwobjects.glm added

PowerWorld test file that shows the problem.

,

"dchassin": * status changed from new to accepted

I'm definitely going to need a simpler case to look at this (without PowerWorld).

This could very well be a deadlock problem. Such things use messages that are usually very clear, i.e., \wlock timeout\ of lock timeout. There's a build option (LOCKTRACE) in core/lock.h that you can turn on which will allow you to trace where the lock comes from, who is contending for it and who gave up waiting.

If exception handling on is behaving badly, that will express itself as a very long delay before the program aborts silently. I've seen that also when deadlocks occur, but the difference is a deadlock results in the timeout messages mentioned above. Bad exception handling is completely silent. Even something as simple as indexing off the end of an array could cause such a silent abort. We've been trying to get rid of the exception popups in Windows, and the result may be this behavior. I'm still holding out for a \good\ solution. Until we find that, we may have to decide which of these two bad behaviors is better. It probably depends on who you're talking to.

Having said that, I thought assert used exceptions only internally and those were caught before being passed back to the core as TS_INVALID sync times. So I'm not sure that would be the cause of such problem, assuming everything is working right.

Just to be complete, one final possibility depends on how the COM link is done. You can get timeouts there too. If PowerWorld and GridLAB-D are each waiting for the other to send something, then at some point one of them gives up and times out. I would expect the message would be pretty clear and you most likely would know whether that's happening.

,

"dchassin": * cc mhauer, jfculler added

There were two problems.

1) All charXXXX members are classes now in 3.0 C++ modules, not typedefs to char[]. When using direct access to charXXXX members of classes on vararg functions, you must use either (const char_) or .getstring() to obtain the pointer to the buffer, e.g., don't use printf(\%s\,member); use printf(\%s\,(const char)member); or printf(\%s\,member.get_string());. The charXXXX implementation are documented in the charbuf template documents, but this issue is not obvious. A caveat section entry is recommended.

2) Init_deferred was coded such that it ignored PC_AUTOLOCK, which is set on assert objects. Hence, a write lock was taken out when init was called and, because PC_AUTOLOCK was set, the accessor was taking out a read lock too. When PC_AUTOLOCK is set, absolutely no whole-object write locks may be used by core calls prior to calling class member functions. When PC_AUTOLOCK is not set, absolutely no C++ API calls should be taking out read or write locks. The lack of clarity on the documentation related to object locking is a factor in this problem.

,

"dchassin":Both the locking and charbuf documentation have been updated

,