mjrgh / PinballY

A table browser and launcher ("front end") for virtual pinball cabinets.
Other
47 stars 22 forks source link

Metafilter can cause startup crashes when same table exists in multiple systems #86

Closed byancey closed 4 years ago

byancey commented 5 years ago

Describe the bug

Enabling metafilters causes unexpected selection behavior on game startup when the same title exists in more than one system. For example, if you have "Fish Tales (Williams 1992)" in your library for both Visual Pinball and Future Pinball, and either of those titles is selected when you exit Pinball Y, on the next startup PinballY will always select the first table in the list, even if the other table was actually selected on exit. This only happens when metafilters are enabled, and the contents of the actual metafilter does not appear to be relevant. It will happen even if the metafilter always returns true (which should be a no-op).

This behavior seems pretty benign on the surface, however, if table audio files are used on the tables in question, and you begin applying filters that exclude the first table that shows up in the list, it can actually result in a crash on every startup. Here is what I have observed:

  1. If only the second table in the wheel list includes table audio, and only the second table is included in the active filter, and that second table is selected on exit, PinballY will start back up normally with the 2nd table selected, and play the 2nd tables audio file.

  2. If only the first table in the wheel list includes table audio, and only the second table is included in the active filter, and that second table is selected on exit, PinballY will start back up with the 2nd table selected, but it will play the table audio of the first file, rather than being silent, as expected. If you move the wheel to another title and back, the audio will be silent, so the anomaly of playing the audio file from the other table only happens on startup.

  3. If both tables include table audio, and only the second table is included in the active filter, and that second table is selected on exit, on the next startup, PinballY will start to play the audio, hang briefly, and then crash. The actual crash occurs in DShowAudioPlayer.cpp on Line 77 in DShowAudioPlayer's OnEvent() handler as the result of an Access Violation exception due to the pEventEx pointer being invalid.

In that final case where both tables include table audio, if both tables are included in the active filter (i.e. All Tables), and the second table is selected on exit, PinballY will simply select the first table on the next startup, and no crash will occur. Problems seem to occur when PinballY can't actually select the first table of the two due to the active filters on startup.

None of the above behavior above could be replicated with video files, only the table audio files. Also, if I simply disable metafilters, the correct tables are always selected, the correct audio files are always played, and PinballY does not crash even when all other conditions from above are replicated.

To Reproduce

I'm including the specific steps to reproduce the crash. Once the crash is reproduced, the other non-crash behaviors can be observed by removing one or the other table audio files. Also, note that when I use the term "select" here, I simply mean that the tile is in the center of the wheel (the game wheel icon is large). This does not require actually hitting the start button to go to the game menu.

  1. Ensure you have two tables with the same name setup for two different systems. Specifically, the tables must both have the same description name that is used for the PinballY media names. For example, you could have "Fish Tales (Williams 1992)" in both Visual Pinball and Future Pinball (and/or Pinball FX3).

  2. Both tables must have a table audio file in the table audio folder of their respective systems with the same name. It can be a copy of the same audio file, or two different audio files, doesn't matter. In the case of my example in step 1, that would be "Fish Tales (Williams 1992).mp3" in both "Visual Pinball\Table Audio" and "Future Pinball\Table Audio".

  3. You must have a metafilter enabled. It doesn't actually matter what the metafilter does, so long as it doesn't exclude either of the games you are using in this test. To eliminate issues with the metafilter logic itself, I'd suggest that it simply return true, which should be a no-op.

  4. Start Pinball Y, select the All Tables filter, and locate the two tables in the wheel. Those two tables should show up side-by-side in the wheel list. In my case, the Visual Pinball table was first in the wheel list, and the Future Pinball table was second in the list.

  5. Select the second of the two tables in the list (furthest right), and then select a filter that does not include the first. In my example, I selected the Future Pinball table, and then filtered on the System->"Future Pinball", so that the Visual Pinball table was no longer show, and only the Future Pinball table was visible. Note that it is important that you select the 2nd (rightmost) table in the wheel. The issue will not be replicated if you do the 1st (leftmost) table instead.

  6. With the 2nd, and only visible table of the two now selected, go ahead and quit PinballY.

  7. Open settings.txt and search for GameList.CurrentGame. And verify that GameList.CurrentGame shows the game for the system that was selected when you quit PinballY, and GameList.CurrentFilter is set to the filter that you had selected to filter out the other title. In my example, this would be the "GameList.CurrentGame = Fish Tales (Williams 1992).Future Pinball" and "GameList.CurrentFilter = System.Future Pinball". Do not change anything, just verify and close settings.txt.

  8. Start PinballY. If all of the above conditions are met, PinballY will stutter a bit on audio, hang, and then close to desktop. If you are running this test with the debugger, you should hit an Access Violation exception on line 77 of DShowAudioPlayer.cpp. Unfortunately, the stack trace from the exception didn't lead to an obvious root cause, though it did lead me to suspect the metafilters might be involved (which I subsequently confirmed by turning them off).

NOTE: I was able to reproduce this on both my pinball cabinet and the laptop I use for development and testing. I was also able to replicate the behavior using any set of tables with the same name across more than one system. It didn't matter which table I used, or which combination of systems (Visual Pinball, Future Pinball, Pinball FX3). So long as the conditions above are met, the crash will occur.

Expected behavior Table selected at exit is still selected at next startup with no crashes or audio playback anomalies.

Test files N/A

Screenshots N/A

Windows version Window 10.

byancey commented 5 years ago

Adding a few more details and observations to this issue.

In addition to the very specific scenario detailed in the bug description, which is 100% reproducible if the steps are replicated, I have also seen a much more intermittent and random crash that sometimes occurs when the wheel is transitioning from one table to the next. The latter crash is intermittent enough that I have not managed to catch it with the debugger, but I suspect the two may actually be related, as in both cases, you get an audio stutter before the crash, as if it's trying to play the audio...but then chokes and dies in the process.

While I have been able to work around the first crash by not using meta filters, the latter crash persists. If I just fire up the table for a quick game, it's unlikely I'll encounter it, but when I'm doing updates and the table is running for hours at a time, the odds of hitting it eventually are pretty good. Sometimes I see it while I'm manually cycling through the tables, sometimes it'll just go down while attract mode is running.

byancey commented 5 years ago

One other interesting finding worth mentioning...

I indicated in the initial write-up for the bug that the actual crash occurs on line 77 of DShowAudioPlayer's OnEvent() handler. I have no idea what exactly that event handler is supposed to be doing, but I noticed while debugging that the only time I seemed to hit breakpoints in that code was when the crash was about to occur. Doesn't seem PinballY triggers that event handler in normal operation.

On a whim I simply added a return at the beginning of that OnEvent() function call so that it never actually reaches the code where the crash occurs. With that change in place, instead of a crash, I simply get a very brief skip in the audio, and otherwise PinballY goes on happily as if nothing has occurred. I've had the frontend running for about 30 to 40 minutes with this change, and I haven't seen any obvious negative effects of simply bypassing that OnEvent() call completely.

So, this definitely eliminates the crash described in my first post. My plan is to run my front-end with this change in place for a few days and see if it also addresses the the more intermittent random crash I described in my second post. I'll post an update here once I've convinced myself that it either does, or does not address the second crash.

mjrgh commented 5 years ago

Good find!

Hey, try this: instead of returning at the beginning of DShowAudioPlayer::OnEvent, add this at the beginning of the method (line 74, right after the open brace):

// add a self-reference while we're working so that we can't be // deleted in the event callback RefPtr self(this, DoAddRef);

I'm thinking we have a simple case of 'this' getting deleted in the course of the event processing, which is in turn deleting this->pEventEx, which is where the crash is coming from.

Regards, Mike


From: byancey notifications@github.com Sent: Thursday, October 3, 2019 9:53 PM To: mjrgh/PinballY PinballY@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [mjrgh/PinballY] Metafilter can cause startup crashes when same table exists in multiple systems (#86)

One other interesting finding worth mentioning...

I indicated in the initial write-up for the bug that the actual crash occurs on line 77 of DShowAudioPlayer's OnEvent() handler. I have no idea what exactly that event handler is supposed to be doing, but I noticed while debugging that the only time I seemed to hit breakpoints in that code was when the crash was about to occur. Doesn't seem PinballY triggers that event handler in normal operation.

On a whim I simply added a return at the beginning of that OnEvent() function call so that it never actually reaches the code where the crash occurs. With that change in place, instead of a crash, I simply get a very brief skip in the audio, and otherwise PinballY goes on happily as if nothing has occurred. I've had the frontend running for about 30 to 40 minutes with this change, and I haven't seen any obvious negative effects of simply bypassing that OnEvent() call completely.

So, this definitely eliminates the crash described in my first post. My plan is to run my front-end with this change in place for a few days and see if it also addresses the the more intermittent random crash I described in my second post. I'll post an update here once I've convinced myself that it either does, or does not address the second crash.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/mjrgh/PinballY/issues/86?email_source=notifications&email_token=AB7DK3PN2KG7NUKBHZXSYHDQM3D4DA5CNFSM4I45ZK62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAKMPWA#issuecomment-538232792, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB7DK3IXMLOG74J6V3RSBSTQM3D4DANCNFSM4I45ZK6Q.

byancey commented 5 years ago

Tried that. Still crashes. :(

For what it's worth, the handful of times I caught the crash in the debugger, the value of PEventEx was always 0XFFFFFFFFFFFFFFFF, or in other words, -1. Not sure if that helps or not.

Oh, on a side note, right before I saw your request, I also observed another instance of the random crash with my "return" hack in place, so it helped with one crash, but not the other. I probably need to start running a debug version on my cabinet for a while so that when I just happen to catch it, I'll have a stack trace.

Oh, also, just curious to know if DShowAudioPlayer get's invoked only for standalone audio (Table Audio Files), or does it also get invoked for Audio Streams embedded in playfield videos? I exclusively use Table Audio files, so that might be a reason why I might be running into these issues where others may not.

byancey commented 5 years ago

...and one other data point I should add just for completeness. Yesterday I migrated my personal build of PinballY up to XAudio 2.9. I don't plan to push that change back up to GitHub, but since I'm only running Windows 10, figured I'd eliminate the legacy DirectX from my build just to see if that made any difference...it didn't.

byancey commented 5 years ago

One more bit of information, and then I've gotta checkout for the day. :)

I've gone ahead and reverted my change that returns from the event handler. It was more a curiosity than anything, but I did notice that while it eliminated that one specific crash, it did indeed lead to other stability issues. Not terribly surprising. Interesting data point though.

mjrgh commented 5 years ago

but I did notice that while it eliminated that one specific crash, it did indeed lead to other stability issues

As you'd expect - if it's crashing in a line of code and you skip that code, you get rid of that crash. But if that code does something important (which this code does), you lose that important thing and cause other problems. So, indeed, the solution can't be "skip this code".

For what it's worth, the handful of times I caught the crash in the debugger, the value of PEventEx was always 0XFFFFFFFFFFFFFFFF, or in other words, -1.

Really? That's a curious value. Next time it happens, maybe check what the rest of this looks like. Let's see if this is getting corrupted or just pEventEx.

Mike

byancey commented 5 years ago

Agreed...would not propose that as a fix by any stretch, which is why I opted to remove my metafilter rather than running it on my cab.  Just threw it on last night out of curiosity to see if it made any difference with the other random crashes I was seeing (based on the theory they might be related),

The "metafilter" crash is easy enough to reproduce.  I'll see if I can get that data. Can probably post the stack trace as well.

byancey commented 5 years ago

Pretty safe to say that all of *this is corrupted. The address of this actually appears to be in a valid memory space, but it's contents are completely bogus. A high percentage of the time it ends up pointing to a memory location where everything is just populated with 0xDs. Pointers within the struct are all 0xDDDDDDDDDDDDDDDD, the booleans all evaluate to true (221) which is just 0xDD, and the other values follow suite. Sometimes it points to memory that's not populated entirely with 0XD, but it's still pretty clear in those instances that the contents are just a bunch of random stuff in memory being cast to a DShowPlayer object.

The previous call up the stack is the following:

    reinterpret_cast<DShowAudioPlayer*>(lParam)->OnEvent();

I would assume we have an invalid lParam here. If I follow lParam up that stack, it's just being passed unused through various OnAppMessage calls. About 7 levels of the tree the lparam is passed to BaseWin::StaticWndProc which makes the following call:

    // dispatch through the virtual window proc
    return self->WndProc(message, wParam, lParam);

Next level up, I can't see who calls StaticWndProc, as the call is in [External Code]. At that point, the trail goes cold.

mjrgh commented 5 years ago

0xDD fill means something specific with the Microsoft libraries: it means that the object has been deleted with free() or delete. The Microsoft debug libraries fill each memory block used in a free() or delete call with that value as a debugging aid.

So this confirms that we have do indeed have an out-of-sequence deletion. It must be getting deleted even before the call to this method, though, since you said the added self-reference didn't help. The added self-reference would have prevented the object from being deleted until the method returned, but only if the object wasn't already deleted.

You should be able to confirm that by adding a debug test at the entry to the method to see if 'this' has been deleted - something like this should work:

if (reinterpret_cast(pEventEx) == 0xDDDDDDDDDDDDDDDD)) DebugBreak();

I'd keep the added self-reference as well.

With that added (and running in the debugger), and the self-reference in place, I think it should be impossible for the crash to occur without hitting that breakpoint first. If that's not true, I'm really missing something.

Regards, Mike


From: byancey notifications@github.com Sent: Friday, October 4, 2019 3:28 PM To: mjrgh/PinballY PinballY@noreply.github.com Cc: mjrgh mjr_@hotmail.com; Comment comment@noreply.github.com Subject: Re: [mjrgh/PinballY] Metafilter can cause startup crashes when same table exists in multiple systems (#86)

Pretty safe to say that all of *this is corrupted. The address of this actually appears to be in a valid memory space, but it's contents are completely bogus. A high percentage of the time it ends up pointing to a memory location where everything is just populated with 0xDs. Pointers within the struct are all 0xDDDDDDDDDDDDDDDD, the booleans all evaluate to true (221) which is just 0xDD, and the other values follow suite. Sometimes it points to memory that's not populated entirely with 0XD, but it's still pretty clear in those instances that the contents are just a bunch of random stuff in memory being cast to a DShowPlayer object.

The previous call up the stack is the following:

    reinterpret_cast<DShowAudioPlayer*>(lParam)->OnEvent();

I would assume we have an invalid lParam here. If I follow lParam up that stack, it's just being passed unused through various OnAppMessage calls. About 7 levels of the tree the lparam is passed to BaseWin::StaticWndProc which makes the following call:

    // dispatch through the virtual window proc
    return self->WndProc(message, wParam, lParam);

Next level up, I can't see who calls StaticWndProc, as the call is in [External Code]. At that point, the trail goes cold.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/mjrgh/PinballY/issues/86?email_source=notifications&email_token=AB7DK3OQGTONEGHWOEKAS63QM67R5A5CNFSM4I45ZK62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEANBJNY#issuecomment-538580151, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB7DK3NBHWZQTMZH2SWBME3QM67R5ANCNFSM4I45ZK6Q.

mjrgh commented 5 years ago

Okay, I see what's going on. It's a bit complicated, but it's the way DShow delivers events to listeners through a window. DShow must be queuing up an event just before the object is deleted, so the event sits in the queue while the object is being deleted, then gets delivered to the no-longer-extant object.

Let's see if there's a simple solution. Try adding this to DShowAudioPlayer::~DShowAudioPlayer():

if (pEventEx != nullptr) pEventEx->SetNotifyWindow(NULL, DSMsgOnEvent, 0);

If we're lucky, the DShow guys thought about this situation and will take care of deleting any pending, previously-queued messages when tearing down the notifier.

If that doesn't work, the next thing to try would be to add this:

if (pEventEx != nullptr) { // tell DShow to stop sending us event messages pEventEx->SetNotifyWindow(NULL, DSMsgOnEvent, 0);

// clear out any pending event messages, so that they don't // get routed to the soon-to-be-deleted object later on long eventCode; LONG_PTR lParam1, lParam2; while (SUCCEEDED(pEventEx->GetEvent(&eventCode, &lParam1, &lParam2, 0))) { // clean up the event parameters pEventEx->FreeEventParams(eventCode, lParam1, lParam2); } }

And if that doesn't work, which it might not, we'll need something more sophisticated to recognize the deleted object - probably a map of valid objects that we maintain the constructor and destructor, and which we consult in the message callback.

Regards, Mike


From: Mike Roberts mjr_@hotmail.com Sent: Friday, October 4, 2019 3:47 PM To: mjrgh/PinballY reply@reply.github.com Subject: Re: [mjrgh/PinballY] Metafilter can cause startup crashes when same table exists in multiple systems (#86)

0xDD fill means something specific with the Microsoft libraries: it means that the object has been deleted with free() or delete. The Microsoft debug libraries fill each memory block used in a free() or delete call with that value as a debugging aid.

So this confirms that we have do indeed have an out-of-sequence deletion. It must be getting deleted even before the call to this method, though, since you said the added self-reference didn't help. The added self-reference would have prevented the object from being deleted until the method returned, but only if the object wasn't already deleted.

You should be able to confirm that by adding a debug test at the entry to the method to see if 'this' has been deleted - something like this should work:

if (reinterpret_cast(pEventEx) == 0xDDDDDDDDDDDDDDDD)) DebugBreak();

I'd keep the added self-reference as well.

With that added (and running in the debugger), and the self-reference in place, I think it should be impossible for the crash to occur without hitting that breakpoint first. If that's not true, I'm really missing something.

Regards, Mike


From: byancey notifications@github.com Sent: Friday, October 4, 2019 3:28 PM To: mjrgh/PinballY PinballY@noreply.github.com Cc: mjrgh mjr_@hotmail.com; Comment comment@noreply.github.com Subject: Re: [mjrgh/PinballY] Metafilter can cause startup crashes when same table exists in multiple systems (#86)

Pretty safe to say that all of *this is corrupted. The address of this actually appears to be in a valid memory space, but it's contents are completely bogus. A high percentage of the time it ends up pointing to a memory location where everything is just populated with 0xDs. Pointers within the struct are all 0xDDDDDDDDDDDDDDDD, the booleans all evaluate to true (221) which is just 0xDD, and the other values follow suite. Sometimes it points to memory that's not populated entirely with 0XD, but it's still pretty clear in those instances that the contents are just a bunch of random stuff in memory being cast to a DShowPlayer object.

The previous call up the stack is the following:

    reinterpret_cast<DShowAudioPlayer*>(lParam)->OnEvent();

I would assume we have an invalid lParam here. If I follow lParam up that stack, it's just being passed unused through various OnAppMessage calls. About 7 levels of the tree the lparam is passed to BaseWin::StaticWndProc which makes the following call:

    // dispatch through the virtual window proc
    return self->WndProc(message, wParam, lParam);

Next level up, I can't see who calls StaticWndProc, as the call is in [External Code]. At that point, the trail goes cold.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/mjrgh/PinballY/issues/86?email_source=notifications&email_token=AB7DK3OQGTONEGHWOEKAS63QM67R5A5CNFSM4I45ZK62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEANBJNY#issuecomment-538580151, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB7DK3NBHWZQTMZH2SWBME3QM67R5ANCNFSM4I45ZK6Q.

byancey commented 5 years ago

Neither of those suggestions helped, but I think something else might be going on here. When I was testing this earlier today, there was a high probability of hitting the 0xDD fill case, but not 100%...I'd say 60-70% of the time perhaps. The other 30-40% of the time the object was just populated with random garbage.

The last dozen or so times I've run this, I'm 0% on hitting 0xDD fill. Everything is just coming up random garbage. So, either the objects are being deleted, reallocated, and overwritten with data from another object before the event handler tries to process them, or perhaps the 0xDD was a red herring all along and I just happened to randomly hit at a memory location from another previously deleted object?

Needless to say, checking the address of pEventEx for 0xDDDDDDDDDDDDDDDD isn't going to catch the cases I'm seeing now. What I did instead was added a token ULONGLONG member to the DAudioShowPlayer class and initialized it to a known value. I check that value in the event handler, and if it doesn't match the known good value, I know we've hit the failure case. As before, if I simply return at this point, I get a brief audio stutter, and PinballY goes on normally, table audio and all.

....so, about that audio stutter. To this point the audio files I've been working with for the two tables (Fish Tales VPX & Fish Tales FX3) are actually just copies of the same mp3 in different system directories. On a hunch, I swapped out one of those two mp3 files with music from another table that sounds very different at the start. What's now clear is that the "stutter" in the audio, is actually PinballY attempting to play the table audio from the first table which is hidden by the active filter, before successfully playing the correct audio file for the second table.

Maybe PinballY gets ahead of itself and queues the first before realizing it's not selected by the active filter...and that somehow leads us down this road??

byancey commented 5 years ago

I've been stepping through the code from breakpoints in Playfieldview.cpp where the initial DShowAudioPlayer objects are allocated. I still can't quite see where things go off the rails, but I can at least confirm that the corrupted DShowAudioPlayer objects are the same objects (by address) that are allocated in Playfieldview.cpp...so it's not that the references themselves are corrupted, just the contents of the objects. I can also confirm that those initial DShowAudioPlayer objects are created for the VPX version of the game that's hidden by the active filter rathre than FX3 version that is actually selected.

mjrgh commented 5 years ago

So, either the objects are being deleted, reallocated, and overwritten with data from another object before the event handler tries to process them

Correct.

or perhaps the 0xDD was a red herring all along

Unlikely. Given the code flow here, I think an out-of-sequence deletion is pretty clearly what's going on.

Neither of those suggestions helped ...

I'm not surprised given the way the messages are handled, but I wanted to try those approaches first because they're simpler. The reason they didn't work is that the message is in the window message queue, which is separate from the DShow objects (ours and theirs). The errant message is getting queued, and it contains a pointer encoded in LPARAM form. Trying to stop the notifications didn't help because there was already a notification in the window queue, and apparently DShow doesn't try to remove that (not entirely surprising since Windows doesn't have a public API to do that). Trying to vacuum out pending notifications from the DShow pipeline didn't help because those aren't the queued notifications causing the problem; the queued window event message is.

So this apparently will take the more complex solution I was hoping wouldn't be necessary. I'll try to cook something up for you to try.

Regards, Mike


From: byancey notifications@github.com Sent: Friday, October 4, 2019 5:51 PM To: mjrgh/PinballY PinballY@noreply.github.com Cc: mjrgh mjr_@hotmail.com; Comment comment@noreply.github.com Subject: Re: [mjrgh/PinballY] Metafilter can cause startup crashes when same table exists in multiple systems (#86)

Neither of those suggestions helped, but I think something else might be going on here. When I was testing this earlier today, there was a high probability of hitting the 0xDD fill case, but not 100%...I'd say 60-70% of the time perhaps. The other 30-40% of the time the object was just populated with random garbage.

The last dozen or so times I've run this, I'm 0% on hitting 0xDD fill. Everything is just coming up random garbage. So, either the objects are being deleted, reallocated, and overwritten with data from another object before the event handler tries to process them, or perhaps the 0xDD was a red herring all along and I just happened to randomly hit at a memory location from another previously deleted object?

Needless to say, checking the address of pEventEx for 0xDDDDDDDDDDDDDDDD isn't going to catch the cases I'm seeing now. What I did instead was added a token ULONGLONG member to the DAudioShowPlayer class and initialized it to a known value. I check that value in the event handler, and if it doesn't match the known good value, I know we've hit the failure case. As before, if I simply return at this point, I get a brief audio stutter, and PinballY goes on normally table audio and all.

....so, about that audio stutter. To this point the audio files I've been working with for the two tables (Fish Tales VPX & Fish Tales FX3) are actually just copies of the same mp3 in different system directories. On a hunch, I swapped out one of those two mp3 files with music from another table that sounds very different at the start. What's now clear is that the "stutter" in the audio, is actually PinballY attempting to play the table audio from the first table which is hidden by the active filter, before successfully playing the correct audio file for the second table.

Maybe PinballY gets ahead of itself and queues the first before realizing it's not selected by the active filter...and that somehow leads us down this road??

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/mjrgh/PinballY/issues/86?email_source=notifications&email_token=AB7DK3PVSWPLDZHJMERUYMLQM7QKZA5CNFSM4I45ZK62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEANGQOA#issuecomment-538601528, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB7DK3NKFIFNFZDGKHCXA7TQM7QKZANCNFSM4I45ZK6Q.

byancey commented 5 years ago

Yes, your instincts are spot-on here. First time I've actually used the Visual Studio debugger, but I've gotten far enough where I can see that an object is allocated and queued for the VPX version of the table....and then subsequently released as a side effect of the correct FX3 object being allocated.

Question. Your suggestions to this point seem to be focused on dealing with cancelling the in-progress event for the VPX table, but it's not obvious to me why that event is even initiated in the first place. Is the fact that PinballY starts the ball rolling for the VPX table even though the FX3 table is actually selected expected behavior here? At this point my instinct would be to see what could be done to prevent the VPX table sound from even being started...but my familiarity with the overall PinballY architecture is still very limited.

mjrgh commented 5 years ago

At this point my instinct would be to see what could be done to prevent the VPX table sound from even being started..

That might be good to do, too - but the crash needs to be fixed independently, as this is undoubtedly just one way of triggering it. This loophole where an event can be pending for a deleted object is inherently problematic and needs to be closed.

Regards, Mike


From: byancey notifications@github.com Sent: Friday, October 4, 2019 7:31 PM To: mjrgh/PinballY PinballY@noreply.github.com Cc: mjrgh mjr_@hotmail.com; Comment comment@noreply.github.com Subject: Re: [mjrgh/PinballY] Metafilter can cause startup crashes when same table exists in multiple systems (#86)

Yes, your instincts are spot-on here. First time I've actually used the Visual Studio debugger, but I've gotten far enough where I can see that an object is allocated and queued for the VPX version of the table....and then subsequently released as a side effect of the correct FX3 object being allocated.

Question. Your suggestions to this point seem to be focused on dealing with cancelling the in-progress event for the VPX table, but it's not obvious to me why that event is even initiated in the first place. Is the fact that PinballY starts the ball rolling for the VPX table even though the FX3 table is actually selected expected behavior here? At this point my instinct would be to see what could be done to prevent the VPX table sound from even being started...but my familiarity with the overall PinballY architecture is still very limited.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/mjrgh/PinballY/issues/86?email_source=notifications&email_token=AB7DK3P7TZZUB32ATIUVA7TQM736LA5CNFSM4I45ZK62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEANIKCA#issuecomment-538608904, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB7DK3NJT3YJRIGQWZHXWDLQM736LANCNFSM4I45ZK6Q.

mjrgh commented 5 years ago

...Maybe you can pursue if and why this other audio track gets started when it shouldn't while I try to come up with the crash fix.

Mike


From: Mike Roberts mjr_@hotmail.com Sent: Friday, October 4, 2019 8:11 PM To: mjrgh/PinballY reply@reply.github.com Subject: Re: [mjrgh/PinballY] Metafilter can cause startup crashes when same table exists in multiple systems (#86)

At this point my instinct would be to see what could be done to prevent the VPX table sound from even being started..

That might be good to do, too - but the crash needs to be fixed independently, as this is undoubtedly just one way of triggering it. This loophole where an event can be pending for a deleted object is inherently problematic and needs to be closed.

Regards, Mike


From: byancey notifications@github.com Sent: Friday, October 4, 2019 7:31 PM To: mjrgh/PinballY PinballY@noreply.github.com Cc: mjrgh mjr_@hotmail.com; Comment comment@noreply.github.com Subject: Re: [mjrgh/PinballY] Metafilter can cause startup crashes when same table exists in multiple systems (#86)

Yes, your instincts are spot-on here. First time I've actually used the Visual Studio debugger, but I've gotten far enough where I can see that an object is allocated and queued for the VPX version of the table....and then subsequently released as a side effect of the correct FX3 object being allocated.

Question. Your suggestions to this point seem to be focused on dealing with cancelling the in-progress event for the VPX table, but it's not obvious to me why that event is even initiated in the first place. Is the fact that PinballY starts the ball rolling for the VPX table even though the FX3 table is actually selected expected behavior here? At this point my instinct would be to see what could be done to prevent the VPX table sound from even being started...but my familiarity with the overall PinballY architecture is still very limited.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/mjrgh/PinballY/issues/86?email_source=notifications&email_token=AB7DK3P7TZZUB32ATIUVA7TQM736LA5CNFSM4I45ZK62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEANIKCA#issuecomment-538608904, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB7DK3NJT3YJRIGQWZHXWDLQM736LANCNFSM4I45ZK6Q.

mjrgh commented 5 years ago

Here - give this a try.

Regards, Mike


From: byancey notifications@github.com Sent: Friday, October 4, 2019 7:31 PM To: mjrgh/PinballY PinballY@noreply.github.com Cc: mjrgh mjr_@hotmail.com; Comment comment@noreply.github.com Subject: Re: [mjrgh/PinballY] Metafilter can cause startup crashes when same table exists in multiple systems (#86)

Yes, your instincts are spot-on here. First time I've actually used the Visual Studio debugger, but I've gotten far enough where I can see that an object is allocated and queued for the VPX version of the table....and then subsequently released as a side effect of the correct FX3 object being allocated.

Question. Your suggestions to this point seem to be focused on dealing with cancelling the in-progress event for the VPX table, but it's not obvious to me why that event is even initiated in the first place. Is the fact that PinballY starts the ball rolling for the VPX table even though the FX3 table is actually selected expected behavior here? At this point my instinct would be to see what could be done to prevent the VPX table sound from even being started...but my familiarity with the overall PinballY architecture is still very limited.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/mjrgh/PinballY/issues/86?email_source=notifications&email_token=AB7DK3P7TZZUB32ATIUVA7TQM736LA5CNFSM4I45ZK62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEANIKCA#issuecomment-538608904, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB7DK3NJT3YJRIGQWZHXWDLQM736LANCNFSM4I45ZK6Q.

byancey commented 5 years ago

Here - give this a try. Regards, Mike

Was there supposed to be something else attached to your last post?

byancey commented 5 years ago

Ok, I have a pretty good feel for what's going on with the game selection at startup. On a normal startup, when no metafilters are enabled, the playfield media setup for the initial game selection is triggered by a call to UpdateSelection() within the PlayfieldView's OnCommand() method. This is always invoked during the startup process.

When metafilters are enabled, there is an additional call to UpdateSelection() that occurs much earlier on in the startup process as part of the JavaScript initialization within the JsCreateMetaFilter() method. After a metafilter is added to the filter list, JsCreateMetaFilter calls RefreshFilter, which has some logic intended to select a lexically similar game in the event the current selected game is eliminated by the new filter rules. Here's where we run into problems.

First, this particular call to JsCreateMetaFilter happens early enough in the startup process that the default "All Tables" filter is still active. The incoming filter from the previous session that will be applied later in the startup process is still stored away in a pendingRestoredFilter member variable at this point.

Second, the algorithm that does the lexical match only takes the table name into consideration, not the system...and if there are multiple exact name matches, the first match will always be used. Since the All Tables filter is active, the call to RefreshFilter at this point in time will always results in the first table in the list being set as the current game, even if that table will subsequently be eliminated from the list by the pendingRestoredFilter. JsCreateMetaFilter() calls UpdateSelection() as soon as the filters have been refreshed, and this results in DShowAudioPlayer event being queued for the soon to be filtered table.

As the normal startup process proceeds, the pendingRestoredFilter becomes the active filter, the correct game is selected, and the subsequent call to UpdateSelection() within the the PlayfieldView's OnCommand() method causes the original DShowAudioPlayer object to be deleted. This all happens before that first event can be processed, and when it does, we have our crash.

I see a few of possibilities to address this.

  1. Ensure the pendingRestoreFilter is active before the metafilters are added during startup.
  2. Update the lexical matching algorithm to take system into consideration.
  3. Don't invoke UpdateSelection within JsCreateMetaFilters.

Option 3 is the most straight forward...just comment out the offending call to UpdateSelection in JsCreateMetaFilters (there's a call to UpdateAllStatusText that is probably also redundant). I've actually implemented and tested this, and can confirm that it eliminates this particular crash. As near as I can tell, my metafilters are all working fine as well. Not sure if there are any cases where jsCreateMetaFilters would be called outside of this startup sequence, but obviously UpdateSelection wouldn't be triggered in those cases either.

@mjrgh let me know your thoughts on your preferred solution. Also, if you have something you'd like me to test with regard to the handling of deleted events more generally, let me know. My fix is for a very specific scenario and obviously is not going to help with any of the other random crashes I've seen...but I suspect (hope) your fix might.

Thanks!

-- Bryce

mjrgh commented 5 years ago

There was a zip file attached. Mabye github doesn't pass the attachment through? Try here:

http://mjrnet.org/pinscape/downloads/dshow-test.ziop


From: byancey notifications@github.com Sent: Friday, October 4, 2019 10:47 PM To: mjrgh/PinballY PinballY@noreply.github.com Cc: mjrgh mjr_@hotmail.com; Comment comment@noreply.github.com Subject: Re: [mjrgh/PinballY] Metafilter can cause startup crashes when same table exists in multiple systems (#86)

Here - give this a try. Regards, Mike

Was there supposed to be something else attached to your last post?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/mjrgh/PinballY/issues/86?email_source=notifications&email_token=AB7DK3MHTWO7ANGVEIKP3BLQNATAZA5CNFSM4I45ZK62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEANK4EA#issuecomment-538619408, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB7DK3MROMO6DF63UMO24K3QNATAZANCNFSM4I45ZK6Q.

byancey commented 5 years ago

Got it. I'll give it a try and let you know.

Note that I also went ahead and implemented an update to the lexical matching algorithm (my suggestion item 2 from my list above). I have verified that this also effectively fixes metafilter selection issue, as PinballY gets the selection correct the first time, and no second conflicting event is queued. It ended up being pretty straight forward...just appended the respective system display names to each of the 3 strings passed into the IsLexicallyCloser call. Of my 3 suggestions, I think this would be my preferred solution. I'll plan to submit a pull request with this change so that you can take a look.

byancey commented 5 years ago

Good news...your fix does indeed appear to have alleviated the crash caused by deleted objects in the event queue! Can't wait to try this out and see if it eliminates my other random crashes as well.

Regarding this specific scenario, with only your fix in place, you still sometimes hear a brief audio glitch as the first table audio starts playing and is then interrupted by the second, but PinballY does not crashe. Once my fix to RefreshFilter is added on top of that, the audio glitch is completely gone, as the correct game is selected on the first pass during startup, and the need to delete an extraneous event never comes into play.

I noted another behavior that my Refresh Filter update also addresses. When you have multiple tables of the same name across multiple systems, if you toggle between filters that exclude some of those tables and filters that don't, when the more inclusive filter is selected, the first of those games in the wheel list will always be selected even if the game of the same name from another system was previously selected and is still included in the new filter. With my refresh filter updates in place, the selection will not change if the current table is present in both filters.

I'm going to roll both of these changes out to my cabinet and just let it run in attract mode for a while today. I typically see one of those random crashes within a few hours...so we'll see.

UPDATE: Ok ,so I have hit a few of those random crashes even after pulling in these changes, so it looks like those are not the same root cause. Hopefully I can catch one with the debugger. Will open a separate issue once I can isolate it a bit more.

mjrgh commented 5 years ago

That's great - let me know how it goes with the further testing. It'd be a nice bonus if this was the source of the unknown other crashes as well. And it's wonderful that your name match change did the trick with the rest of the game-change audio glitching issue. I'll have to go back and think it through to make sure there's not some new problem that'll cause, but at first glance it seems reasonble.

Regards, Mike


From: byancey notifications@github.com Sent: Saturday, October 5, 2019 12:16 PM To: mjrgh/PinballY PinballY@noreply.github.com Cc: mjrgh mjr_@hotmail.com; Mention mention@noreply.github.com Subject: Re: [mjrgh/PinballY] Metafilter can cause startup crashes when same table exists in multiple systems (#86)

Good news...your fix does indeed appear to have alleviated the crash caused by deleted objects in the event queue! Can't wait to try this out and see if it eliminates my other random crashes as well.

Regarding this specific scenario, with only your fix in place, you still sometimes hear a brief audio glitch as the first table audio starts playing and is then interrupted by the second, but PinballY does not crashe. Once my fix to RefreshFilter is added on top of that, the audio glitch is completely gone, as the correct game is selected on the first pass during startup.

I noted another behavior that my Refresh Filter update also addresses. When you have multiple tables of the same name across multiple systems, if you toggle between filters that exclude some of those tables and filters that don't, when the more inclusive filter is selected, the first of those games in the wheel list will always be selected even if the game of the same name from another system was previously selected and is still included in the new filter. With my refresh filter updates in place, the selection will not change if the current table is present in both filters.

I'm going to roll both of these changes out to my cabinet and just let it run in attract mode for a while today. I typically see one of those random crashes within a few hours...so we'll see.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mjrgh/PinballY/issues/86?email_source=notifications&email_token=AB7DK3OPIRZYFW2LNCWK3GDQNDRZNA5CNFSM4I45ZK62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEANZZMY#issuecomment-538680499, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB7DK3LJUXABVD6M5NFM4YLQNDRZNANCNFSM4I45ZK6Q.

byancey commented 5 years ago

My Random Crashes are still happening....but these may be self-inflicted.

I managed to catch one of these in the debugger, and while the actual exception happens in ntdll.dll in an RtlUserThread with no debug symbols for the entire stack trace, there are enough clues to be found by looking at the call stack and the other running threads to lead me to a theory.

I don't use the DMD High Scores feature. Early on I searched for an option in the settings to disable this feature, but as far as I know, no such option exists. I now believe this can probably be accomplished using JavaScript to override the high score events, but at the time I was not familiar with PinballY's JavaScript capabilities, so I made a quick source code change in my local branch to disable the DMD scores. This was a stop-gap solution, and I had intended to come back and implement a more robust solution tied into the settings dialog that could potentially be pushed into the baseline, but I never got back to that. In retrospect, I suspect the stop-gap approach I took was not entirely thread-safe.

I'm going to revert that change and see how long I can go without hitting the crash. Hopefully that's the problem and I can look at disabling this via JavaScript instead.

byancey commented 5 years ago

Removed my previous code changes that disable the DMD High Scores feature and replaced it with a JavaScript event handler to do the same. The cabinet has been running with these changes for about 4 hours now with no crashes. I think it's safe to say that was the root cause of my other random crashes.

byancey commented 5 years ago

No doubt the change in my method of disabling DMD High Scores is a better solution than my previous approach, but it appears I was a bit hasty declaring victory on my random crashes. I think I was running over 7 hours straight with no issues, but I've seen two crashes since, one without the debugger, and one just now with the debugger attached.

The crash was still in ntdll.dll and has a lot of similarities to the earlier crash I caught, but this time there was no evidence that the HighScore feature being active in any of the threads, and no symbols for any of the active threads to suggest a possible cause to me.

At the same time I've been making the changes in this thread, I've also been migrating my playfield media from a mix of 4K & 1080P, to a consistent 1440p across the board. It's possible that the fact that I have a lot less 4K videos accounted for the ability to run over 7 hours without hitting the crash (less than 2 hours has been typical). There were once again lots of call into vlccore in the offending thread, so it's possible the crash is less likely with 1440p media than with with 4K media. Also, I've never actually seen a PinballY crash on my laptop, which is quite a bit more powerful than my cabinet in both the CPU and GPU department.

For the record, if I can consistently get 7+ hours of runtime from PinballY, you won't hear me complaining....I rarely keep it on that long unless I'm refreshing tables and media.

In any case, since this crash isn't related to the problem this thread was opened for, if I'm able to collect any actionable data, I'll submit it as another issue to track it.

mjrgh commented 5 years ago

Rare crashes that seem to be down in the vlc layer could actually be libvlc. It's long enough since I updated the libvlc version included in the build that there have probably been a couple of dot releases since then, so it might be worth grabbing their latest and see if that helps. It's pretty easy to update your copy locally. See Notes/Libvlc Notes.txt in the repos for instructions if you want to give it a try.

Not that updating it will necessarily fix the problem even if it's in libvlc - there's no reason to assume they'd even know the bug exists, even if it were their bug. But libvlc is very widely used, so it gets tested pretty thoroughly over time. I had some mystery crashes in libvlc in the early releases that cleared up on subsequent libvlc updates, so it's not unprecedented.

Mike


From: byancey notifications@github.com Sent: Sunday, October 6, 2019 8:12 PM To: mjrgh/PinballY PinballY@noreply.github.com Cc: mjrgh mjr_@hotmail.com; Mention mention@noreply.github.com Subject: Re: [mjrgh/PinballY] Metafilter can cause startup crashes when same table exists in multiple systems (#86)

No doubt the change in my method of disabling DMD High Scores is a better solution than my previous approach, but it appears I was a bit hasty declaring victory on my random crashes. I think I was ran over 7 hours straight with no issues, but I've seen two crashes since, one without the debugger, and one just now with the debugger attached.

The crash was still in ntdll.dll and has a lot of similarities to the earlier crash I caught, but this time there was no evidence that the HighScore feature being active in any of the threads, and no symbols for any of the active threads to suggest a possible cause to me.

At the same time I've been making the changes in this thread, I've also been migrating my playfield media from a mix of 4K & 1080P, to a consistent 1440p across the board. It's possible that the fact that I have a lot less 4K videos accounted for the ability to run over 7 hours without hitting the crash (less than 2 hours has been typical). There were once again lots of call into vlccore in the offending thread, so it's possible the crash is less likely with 1440p media than with with 4K media. Also, I've never actually seen a PinballY crash on my laptop, which is quite a bit more powerful than my cabinet in both the CPU and GPU department.

For the record, if I can consistently get 7+ hours of runtime from PinballY, you won't hear me complaining....I rarely keep it on that long unless I'm refreshing tables and media.

In any case, since this crash isn't related to the problem this thread was opened for, if I'm able to collect any actionable data, I'll submit it as another issue to track it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mjrgh/PinballY/issues/86?email_source=notifications&email_token=AB7DK3KI25ADVQAVF2TC5JLQNKSJBA5CNFSM4I45ZK62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAO5CLQ#issuecomment-538825006, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB7DK3MZ7OO5DRWKAPUWKKTQNKSJBANCNFSM4I45ZK6Q.

byancey commented 5 years ago

I've updated libvlc to 3.0.8 and my cab is up and running with the debugger...now we wait.

byancey commented 5 years ago

Mike, I think between your fix for handling of deleted objects for queued events, and the update I pushed up for RefreshFilter(), the issues this ticket was filed for have been addressed. Shall we close it now, or would you like to wait until you have a chance to pull those fixes into the master branch?

Oh, I'll also go ahead and push up my libvlc 3.0.8 update and submit a pull request. While it ended up not making a difference for my random crash issues, I did the work to update it, so I figure you may as well have the option of including it if you wish.

mjrgh commented 4 years ago

The changes are in the beta 15, so I'm going to consider this closed!