livingcomputermuseum / ContrAlto

This repository contains the source code for Living Computers: Museum+Labs's Xerox Alto emulator, ContrAlto.
GNU Affero General Public License v3.0
230 stars 25 forks source link

Suggestion: Performance optimization for Ethernet_Controller.cs #6

Closed marciot closed 7 years ago

marciot commented 7 years ago

Currently the inputPollEvent task runs continuously every 5400 nsec. It calls InputHandler, which only does useful work if _inputState is either InputState.ReceiverWaiting or InputState.Receiving.

My suggestion:

1) inputPollEvent should be scheduled to run in initializeReceiver 2) It should not be scheduled to run in ResetInterface. 3) InputHandler should only reschedule itself to run if inputState is either InputState.ReceiverWaiting or InputState.Receiving.

This would have the effect of starting the periodic polling task when the Alto is actively waiting or receiving data and would cause the periodic polling task to terminate when the receiver was finished.

Although the improvement in performance is likely negligible in the native Windows Contralto, it makes a huge difference in the JavaScript port.

If you want to take a look at what the changes would look like, please make me a contributor and I can create a branch and submit the suggestion for you to review.

Thank you,

-- Marcio

livingcomputermuseum commented 7 years ago

Sounds reasonable to me, and would likely be a decent performance gain on the Windows version as well. I don't believe I need to do anything for you to be able to submit changes to me, you should just be able to submit diffs to me. (Ask Seth, I've never tried submitting diffs to myself ;). Let me know if it doesn't work...

marciot commented 7 years ago

That should work. Diffs are done offline, so you'll need to apply those using the command line tools, rather than from the GitHub website.

GitHub has a notion of a pull request, which is done online using the GitHub website, but the only way for someone to create a pull request is to be made a collaborator on the original project so they can create a branch, or for them to fork the project and then create a branch. A lot of people end up doing the latter, which is really annoying, as GitHub ends up being a wasteland of forks of other projects.

livingcomputermuseum commented 7 years ago

Hm. Seth was never made a collaborator but was able to send me a pull request, and as far as I know, he didn't fork the project (but perhaps I am mistaken). At any rate, let me know what's convenient for you...

marciot commented 7 years ago

Update: Ignore this post. This works, but I'm still trying to make it better

I've attached a diff. I do not have C# development tools, so I have not compiled or tested this change under Windows. I recommend compiling it and testing Battleship or Maze War before accepting it as good.

However, I did test the equivalent changes to Seth's JavaScript port and it worked fine.

eth-performance-improvement.patch.txt

marciot commented 7 years ago

Update: Ignore this post. I scratched this idea.

On further examination. I think InputHandler could be simplified even further. Currently it does two separate things 1) when inputState == InputState.ReceiverWaiting, it polls the nextPacket queue each time to see whether a new packet is available 2) when inputState == InputState.Receiving, it moves a word into the FIFO queue every 5400 ns.

Of these two tasks, only the second task has anything to do with timing. The first task involves polling for something which can be known directly: a packet only appears in the queue when the host interface calls onHostPacketReceived to put one there. So, I'm thinking the code could be simplified as such:

1) InputHandler becomes responsible only for copying bytes to the FIFO every 5400 ns when the state is InputState.Receiving. 2) onHostPacketReceived checks to see whether the receiver is in the state InputState.ReceiverWaiting. If it is, it loads the packet into incomingPacket, sets the state to InputState.Receiving and starts the InputHandler. If it is not, it pushes the packet into the nextPackets queue. 3) InitializeReceiver checks to see if there is a packet in the nextPackets queue, if so, it sets the state to InputState.Receiving and starts the InputHandler. Otherwise, it sets the state to InputState.ReceiverWaiting.

The upshot of this would be that the periodic task would only run when a packet was actively being copied into the FIFO. This happens relatively rarely. The receiver spends most of the time in either InputState.ReceiverOff (for non-networking programs) or InputState.ReceiverWaiting (for network-enabled programs). There two states do not need polling because the host interface is already doing that work for us.

marciot commented 7 years ago

Rock on. I just found out I can build Contralto on my PC without installing any software. "msbuild" from the command line. It looks like Microsoft pre-loaded everything under Windows 10. Very cool. This gives me a way to test things out.

marciot commented 7 years ago

After attempting to implement my second idea I saw that onHostPacketReceived runs on a different thread, so it would be impossible for it to schedule inputHandler to run on the main thread without adding locking to the scheduler code, which would be bad. I can see why you were polling the queue.

I could get away with it under JavaScript, because there is only one thread running in the browser, but that would mean having a completely different implementation for JavaScript. Grrrr. Annoying.

livingcomputermuseum commented 7 years ago

Yeah, I don't like the polling method, but it was the only thing I could think of aside from actually adding thread-safety to the scheduler, which absolutely kills performance. I don't know if it's necessarily a bad thing for the JavaScript and C# code to diverge in places, if it's done for a good reason.

Thanks for the diffs, let me know if you have any further suggestions here, otherwise I'll work on merging those in.

marciot commented 7 years ago

Actually, don't merge them in quite yet. I'm trying out an additional tweak and when I am done I'll submit a new diff.

I think polling for the queue for packets can happen much less frequently than every 5400 ns, since acceptable network latency is in the order of milliseconds. I am making it so inputHandler is scheduled at longer intervals when the state is InputState.ReceiverWaiting and every 5400 ns when the state is InputState.Receiving and the interface is moving words.

It is a compromise that seems to achieve the objective of not having the polling slow things down, while still maintaining thread safety.

marciot commented 7 years ago

Oh dear. I just found one of those bugs in ContraltoJS that makes me wonder how anything at all is working. In your scheduler queue, you defined push() and pop() to act at the start of the list; in ContraltoJS, Seth mapped those calls to native JavaScript methods which act at the end of the list. So, tasks that are supposed to be scheduled later in time are actually running first. I have no clue how the emulator boots like this, but it may explain why it is sluggish or why it stutters sometimes.

Anyhow, I could modify ContraltoJS to match the behavior you have in your code, but I'll defend his mistake by saying that I find it misleading to have methods called push() and pop() acting at the start of a list. Would you object if I submit a patch either renaming those method (enqueue, dequeue seem better names), or reverse the order of the queue so that the action happens at the end?

livingcomputermuseum commented 7 years ago

Pop() always pulls the element from the head of the list. Push() does not insert to the head (unless the list is empty) – it puts the Event in the queue at the proper timestamp. I suppose the ordering doesn’t technically matter much, but I’m surprised that things work with events firing out of order.

I don’t feel too strongly about the naming convention; I suppose Enqueue/Dequeue makes more sense but it seems like splitting hairs to me.

From: Marcio T. [mailto:notifications@github.com] Sent: Friday, September 30, 2016 3:07 PM To: livingcomputermuseum/ContrAlto ContrAlto@noreply.github.com Cc: Josh Dersch JoshD@LivingComputerMuseum.org; Comment comment@noreply.github.com Subject: Re: [livingcomputermuseum/ContrAlto] Suggestion: Performance optimization for Ethernet_Controller.cs (#6)

Oh dear. I just found one of those bugs in ContraltoJS that makes me wonder how anything at all is working. In your scheduler queue, you defined push() and pop() to act at the start of the list; in ContraltoJS, Seth mapped those calls to native JavaScript methods which act at the end of the list. So, tasks that are supposed to be scheduled later in time are actually running first. I have no clue how the emulator boots like this, but it may explain why it is sluggish or why it stutters sometimes.

Anyhow, I could modify ContraltoJS to match the behavior you have in your code, but I'll defend his mistake by saying that I find it misleading to have methods called push() and pop() acting at the start of a list. Would you object if I submit a patch either renaming those method (enqueue, dequeue seem better names), or reverse the order of the queue so that the action happens at the end?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/livingcomputermuseum/ContrAlto/issues/6#issuecomment-250863523, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATnJhwKTwTOZBZYWdsiXeh7zrlRHPUPGks5qvYfqgaJpZM4KJlL6.

marciot commented 7 years ago

I'll submit a patch to Seth that fixes the queue ordering in ContraltoJS and one to you that renames those methods. It's a minor change, but I think it helps with clarity. Push/pop are generally used for stacks, not for queues.

Anyhow, changing the subject, I have a question about the Ethernet controller: Is the 16 word FIFO something the Alto actually has in hardware? I see no mention of it in the Alto Hardware Manual. There is a vague mention of a "interface buffer" but it does not specify the nature or size of it.

The reason I ask is that data currently gets read and written twice. The microcode calls WriteOutputFifo on the Ethernet controller to write a word, the Ethernet controller code pushes them into a FIFO, and then later those words are copied to the outputData packet buffer; it seems like WriteOutputFifo could write words directly into the outputData buffer. Likewise, ReadInputFifo could read words directly off the _incomingPacket buffer, without using an intermediate FIFO. Getting rid of the FIFO would not only eliminate the copy, but it also means not having to keep track of the FIFO count, which could simplify things quite a bit.

livingcomputermuseum commented 7 years ago

Yes, the 16 word fifo (described as "large" in the manual) is in the original hardware, and it's discussed in the hardware manual, although not in immense detail. See the diagram on page 53. The fifo count is required by the microcode, since it attempts to fill the FIFO before sending. It's likely possible to optimize this but I seriously doubt this is going to cause a major performance impact -- the Alto simply doesn't spend most of its time sending packets.


From: Marcio T. [notifications@github.com] Sent: Friday, September 30, 2016 4:59 PM To: livingcomputermuseum/ContrAlto Cc: Josh Dersch; Comment Subject: Re: [livingcomputermuseum/ContrAlto] Suggestion: Performance optimization for Ethernet_Controller.cs (#6)

I'll submit a patch to Seth that fixes the queue ordering in ContraltoJS and one to you that renames those methods. It's a minor change, but I think it helps with clarity. Push/pop are generally used for stacks, not for queues.

Anyhow, changing the subject, I have a question about the Ethernet controller: Is the 16 word FIFO something the Alto actually has in hardware? I see no mention of it in the Alto Hardware Manual. There is a vague mention of a "interface buffer" but it does not specify the nature or size of it.

The reason I ask is that data currently gets read and written twice. The microcode calls WriteOutputFifo on the Ethernet controller to write a word, the Ethernet controller code pushes them into a FIFO, and then later those words are copied to the outputData packet buffer; it seems like WriteOutputFifo could write words directly into the outputData buffer. Likewise, ReadInputFifo could read words directly off the _incomingPacket buffer, without using an intermediate FIFO. Getting rid of the FIFO would not only eliminate the copy, but it also means not having to keep track of the FIFO count, which could simplify things quite a bit.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/livingcomputermuseum/ContrAlto/issues/6#issuecomment-250878042, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATnJh_zc6GKbAYTx8yVk23h_0yKsWXpKks5qvaJvgaJpZM4KJlL6.

marciot commented 7 years ago

I believe you are correct in that the performance impact is minimal. It's just that the FIFO makes the code a bit hard to follow and I figured if it wasn't real hardware it could be simplified.

Anyhow, changing the subject again: I found a potential trap in the scheduler. Inside the ethernet_controller code, there are two places where the timestamp of an Event object is modified. It looks like this:

        _inputPollEvent.TimestampNsec = _inputReadPeriod;
        _system.Scheduler.Schedule(_inputPollEvent);

I believe this is an optimization, as creating new Event objects each time would be expensive. The trap is that this misbehaves badly if the previous Event had not yet run. The scheduler queue keeps references to events, not copies, and so modifying the timestamp on an event that is still in the queue misorders it, and rescheduling that event introduces a duplicate into the queue.

One way to fix this is to have code that is rescheduling an event follow this pattern:

        _system.Scheduler.CancelEvent(_inputPollEvent);
        _inputPollEvent.TimestampNsec = _inputReadPeriod;
        _system.Scheduler.Schedule(_inputPollEvent);

Perhaps there should be a new Reschedule method added to the Scheduler that takes a reference to an event, a new timestamp and performs these three steps.

I'm not sure whether this happens often enough to warrant a new method for doing it. Thoughts?

marciot commented 7 years ago

Okay, here is a debrief after weeks of tinkering.

I have two implementations of the Ethernet code now. One is a "full" implementation which is a line-for-line copy of yours. It does the scheduling and has a 16-bit FIFO. That code barely works in ContraltoJS -- sometimes the games connect, most of the times they do not. I know it works fine in the native Contralto, so I'm really at a loss as to why in ContraltoJS it performs so poorly. Perhaps it has to do with the slower execution in the browser; I don't believe it should fail just because of slowness, but it is the best explanation I have right now. The various optimizations I tried over the course of the week only helped when the scheduler queue was backwards in ContraltoJS and once that was fixed, my tweaks didn't change the overall performance that much and are probably not worth doing.

My other implementation of the Ethernet code is a highly abridged one. It's does no scheduling, has no FIFO and is fully synchronous. It lies to the microcode -- when receiving packets, it tells the microcode the FIFO is always full, so the microcode reads bytes continuously until the end of the packet; when sending packets, it lies by saying the FIFO is always empty, so the microcode writes bytes continuously until the packet is finished.

This abridged implementation causes the Ethernet microcode to work harder than it would on a real Alto, so it's probably not correct. But since fixing the scheduling queue in ContraltoJS, the abridged Ethernet code works well in both Battleship and Maze War, whereas the full implementation does not.

I'll go ahead and close this thread. My conclusion is that native Contralto should continue to use the full implementation it is using right now, for correctness, while ContraltoJS probably ought to use the abridged one since performance appears to be an issue.

livingcomputermuseum commented 7 years ago

Thanks for the investigation and work here! That seems like an interesting optimization, and if it works, then go for it. Like you say, in the main ContrAlto branch I'd prefer to leave it with the FIFO, to ensure the behavior is more like the real hardware (though the differences may be trivial).

For future reference, the program EDP ("Ethernet Diagnostic Program") on the diag.dsk pack can be used to send and receive packets manually; it's a good way to test things out.

Regarding the scheduler trap -- yes, it's a potential issue; however in all cases it's being done it's known that the event isn't currently scheduled (i.e. each event timestamp modification occurs during a callback for that event's last invocation -- if you are seeing a case where this isn't true, let me know). While it's somewhat dangerous and has to be used carefully, it reduces the amount of code that needs to execute to schedule a recurring event since it's already expensive enough.

marciot commented 7 years ago

Ooh. EDP is something I can definitely use. Thanks for the tip.