project-rig / rig

A collection of tools for developing SpiNNaker applications.
GNU General Public License v2.0
4 stars 0 forks source link

Use only unused cores #202

Closed mossblaser closed 8 years ago

mossblaser commented 8 years ago

Introduction

In the beginning there was read and write and it was good. Except when multiple things wanted to share memory. Then it was very, very bad. And so on the 1.33rd day, S-t created SDRAM alloc and life was good. Multiple applications and even the monitor could share the chip-wide memory resources, there was much rejoicing about entering the 21st century and all was well.

...but all was not well in the world. All this time ReserveResourceConstraints were all that kept Rig applications from trampling the monitor; and nothing at all protected the poor, innocent applications already running on the machine!

Ahem... Anyway... It seems odd that we're so careful to allocate SDRAM in some sane way and yet do not take the same approach to application cores. As well as being a little unsightly on a purely perfectionist perspective, here are a few examples of where this behaviour makes life hugely unpleasant:

Either way, the current way of doing things is a bit short-sighted and I think fixing this warrants proper consideration. This issue describes a suggested fix for this.

Proposed fix

I've been thinking about this for a few days now and so far the nicest solution to this seems to be to introduce a get_resource_constraints() method to MachineController (maybe needs better name...) which produces a whole load of ReserveResourceConstraints which rope-off any cores which aren't idle. ST would be amenable to adding a bit-mask of idle cores in SC&MP to make this process more efficient than reading the process table.

While we're at it we should really try and do something similar for SDRAM availability (e.g. if an instrumentation app has used 32 MByte of SDRAM per chip you probably can't afford to ignore this when placing memory-bound applications). Obviously this will require some changes to SC&MP, most likely with assistance or at least guidance from ST.

Pros of this method

A preemptive implementation of my proposal has been started in the only-use-idle-cores branch.

While thinking about all this I came up with the following rather yuck ideas. I think they're all significantly inferior to the proposed solution and are here mostly to allow people to laugh at me...

  1. get_machine's "Cores" vertex resource could be updated to report idle cores rather than working cores. Users would then proceed to do p&r as usual but without the ReserveResourceConstraint. Some post processing step would then be needed to translate core numbers to the "correct" set of available cores. Obviously a number of these steps would be required for all the different data structures produced.... Yuck...
  2. As above, but with post-processing applied after the allocation step to shift allocations into the correct core numbers. This fixes everything else but relies on nothing caring that allocations may not be within the range of the Machine's resource list, or the Machine object being patched up. Yuck...
  3. SC&MP could present a virtual core numbering scheme on an appid-by-appid basis (e.g. you always got core 0, 1, 2... for your app). Lots of "yuck" in there both in terms of potential for confusion, implementation effort, major (SC&MP) API changes and potential for bugs.
mossblaser commented 8 years ago

@mundya and @neworderofjamie, your feedback would be very welcome as always!

mundya commented 8 years ago

I think this is an excellent idea. I'm happy to do/prototype bits of the SC&MP work if ST is overloaded?

mossblaser commented 8 years ago

I think this is an excellent idea. I'm happy to do/prototype bits of the SC&MP work if ST is overloaded?

Sorry I missed your message!

ST's plan for this is to introduce a new chip-global struct which includes this information in a form which can be SCP'd off quickly and efficiently, ideally also including such things as the total number of working cores and number of links. To truly exploit this get_machine and get_resource_constraints should really be combined in some way... ideally without being messy...

I'm not sure whether the current (slow) implementation is release-worthy or not. If you're willing to do some implementation you should speak to ST; I'm sure he'd be delighted and besides it would make the situation with releasing this a lot easier! 140 isn't making stunningly rapid progress...

neworderofjamie commented 8 years ago

I'm sure either @mossblaser or ST has considered this, but get_free_sdram() is probably not enough -- If you have applications with different lifetimes, your SDRAM heap is going to get fragmented - get_largest_free_sdram_block() would help a bit, but still would not be a guarantee

mossblaser commented 8 years ago

your SDRAM heap is going to get fragmented

Yeah; the question of "how much free ram is there?" is not a straight question. The use-case for this is not for general purpose core-sharing in the machine (which would cause many more serious issues, e.g. routing table/key sharing etc.). Rather it is intended for easing the use of diagnostic software alongside your application. For example, the traffic burstiness measuring script I wrote requires one core on each chip in the machine but is designed specifically to safely coexist with other apps by not using routing table resources etc. In these cases you'd load the diagnostic app then your app, and unload in the reverse order. This means no fragmentation and thus the two answers are kind-of the same.

Clearly having some foreknowledge of SDRAM availability is important (e.g. if you diagnostic app uses lots of RAM) but the answer being nothing more than a not-insane estimate is probably all that is needed. The "how much is available" question can be answered by a counter (which is updated by malloc and free) while the largest free block potentially means walking through the blocks which maybe/maybe-not is more effort... We'll see what ST does anyway!

mundya commented 8 years ago

(there is already a function for getting the largest free block in the heap... Sadly I can't remember what it's called...)

mossblaser commented 8 years ago

A function but sadly not an SCP command.

Chatting with ST; his plan for supporting this was just to make a new SCP command for SC&MP which gathered all this stuff (not actually creating a new struct like SV etc.). Should be fairly easy to add!

On 25 November 2015 at 13:34, Andrew Mundy notifications@github.com wrote:

(there is already a function for getting the largest free block in the heap... Sadly I can't remember what it's called...)

— Reply to this email directly or view it on GitHub https://github.com/project-rig/rig/issues/202#issuecomment-159609369.

mundya commented 8 years ago

Chatting with ST; his plan for supporting this was just to make a new SCP command for SC&MP which gathered all this stuff (not actually creating a new struct like SV etc.). Should be fairly easy to add!

I think this sounds like the sanest approach. The more host <-> SpiNNaker interactions that can be made into either a short command followed by a block read or a block write followed by a short command the better...

mossblaser commented 8 years ago

As a break between presentation run-throughs I'm going to attempt to put together an SCP command which returns everything you might want to know about a chip (e.g. num cores, largest SDRAM block, which links are working, which cores are idle.)

On 25 November 2015 at 14:46, Andrew Mundy notifications@github.com wrote:

Chatting with ST; his plan for supporting this was just to make a new SCP command for SC&MP which gathered all this stuff (not actually creating a new struct like SV etc.). Should be fairly easy to add!

I think this sounds like the sanest approach. The more host <-> SpiNNaker interactions that can be made into either a short command followed by a block read or a block write followed by a short command the better...

— Reply to this email directly or view it on GitHub https://github.com/project-rig/rig/issues/202#issuecomment-159629388.

mossblaser commented 8 years ago

Fixed in #206