Closed mossblaser closed 8 years ago
OK Since this PR is fully backward compatible with older versions of Rig I'd suggest merging it in and I'll work on PRs for other projects in due-course since this PR won't break anything.
I still need to read the tests... sorry
Thanks very much for all the reviewing, take your time! I'll get on the corrections...
I've been thinking more about all this and I've come to the conclusion some things about the API coming out of this PR leave me feeling quite uncomfortable:
With the API exposed as it is here you cannot trivially answer the question "how many idle cores are there in total" without doing your own get_chip_info which is wasteful and doesn't feel right. As a result I think the _get_all_chips_info
method should be made public. It should probably get a better name too...
I do not like the fact that get_resource_constraints
, a MachineController method, is returning place-and-route constraints, a construct designed for a very specific application and with little general use outside that context. Additionally, if you already have a ChipInfo
for every chip there is no reason for get_resource_constraints
to communicate with the machine and thus no reason for it to be part of MachineController. Instead I believe this should live with the other place-and-route utility functions.
Perhaps more controversially get_machine
is essentially exactly the same in that once given a ChipInfo
it does not actually need to communicate with the machine at all. Though the Machine object is carefully designed to actually be quite useful outside of place-and-route, it still has plenty of quirks related to the place-and-route infrastructure (e.g. the concept of 'resources' and the Cores, SDRAM and SRAM sentinels). All this points towards get_machine
being pulled out and put into a place-and-route utility function and even the Machine object being pulled into the P&R namespace. Obviously a deprecation-flagged wrapper in MachineController must remain along with aliases for the old type name for backward compatibility.
Sadly going from the Machine
object to a bare dictionary of ChipInfo
s feels like a regression in high-level functionality in many ways. To relieve some of this I'd propose making get_all_chips_info
return a dict-like object with extra functionality beyond the usual lookup from chip to ChipInfo
. The API should hopefully recreate the useful functions of the Machine object, namely easy iteration over (live) chip and link coordinates, easy testing for chip and link liveness, directly getting the width/height of the network. Further functionality which wouldn't have been appropriate for Machine objects would be things like iterating over core coordinates, or even cores in a particular state.
So, what do you think?
Machine
...OK; following the above, the following things have happened:
get_system_info
method replaces get_machine
, get_resource_constraints
and all that nonsense.SystemInfo
type has been introduced which is a dict from (x, y) to ChipInfo along with some utility methods for iterating over things and checking membership. It should replace Machine objects in settings not related to P&R.get_machine
is now deprecated (but still present for backward compatibility).build_machine
and build_core_constraints
have been provided which use a SystemInfo
object to build those data structures.place_and_route_wrapper
now takes a SystemInfo
as an argument.I think some reorganisation still needs to happen with the Rig module hierarchy:
Machine
type and associated sentinels should be moved into the place_and_route
namespace.Links
type probably needs a better home than the only member of rig/machine.py
...OK, I think this PR is now finally complete and ready for final review and merging! The comment at the top of the PR should hopefully enumerate the changes made. Sadly due to the light internal reorganisation this PR has touched more code than it would otherwise need to.
My read through the tests was cursory, but this looks mostly good to me. I am concerned that on big machines it'll generate huge number of resource constraints, but I may just be misunderstanding that bit of code.
I am concerned that on big machines it'll generate huge number of resource constraints, but I may just be misunderstanding that bit of code.
See previous comment.
TODOs for me before I merge this then are:
Enable generation of global constraints when possible
Would an option be allowing constraints to be associated with multiple locations? - That way you could just modify one set of constraints and then mark the constraint as global (remove the location) if the difference between the set of locations the constraint is applied to and the overall set of locations is just the set of dead chips.
new_location = None if (old_locations - all_locations == dead_locations) else old_locations
Would an option be allowing constraints to be associated with multiple locations?
I don't think that would meaningfully save you compute or storage costs but would make life more complex. I'll just stick to producing a minimal set of constraints as they stand now.
OK; this awaits your final review. Feel free to merge if you're happy!
LGTM, I'll await the tests coming back
Are the changes to rig_par_diagram
necessitated by this trivial to make?
Hopefully none what so ever. Has it started failing? On 3 Jan 2016 09:13, "Andrew Mundy" notifications@github.com wrote:
Are the changes to rig_par_diagram necessitated by this trivial to make?
— Reply to this email directly or view it on GitHub https://github.com/project-rig/rig/pull/206#issuecomment-168480463.
No, I'm stupid, I'd forgotten that Machine
still existed. Sorry!
This PR began as an answer to #202 to attempt to automatically avoid using non-idle cores and has resulted in a few wider-scoped changes to make this process cleaner.
get_machine
has been deprecated since having the MachineController directly handling P&R artefacts seems like a bad idea in general.get_chip_info
uses a new SCP command to quickly return basic information about the health and state of a chip.get_system_info
, and associatedSystemInfo
container has been added which usesget_chip_info
to quickly gather the state of every chip, link and core in a SpiNNaker machine. TheSystemInfo
contains a superset of the informationMachine
did and does so in a far more explicit way. This call is also a little quicker thanget_machine
was.build_machine
has been added which builds aMachine
object from aSystemInfo
when you are doing place-and-route work.build_core_constraints
has been added which builds P&R constraints which reserve all non-idle cores in a machine (including monitors) using information in aSystemInfo
object.SystemInfo
object in place of aMachine
.rig.place_and_route.machine
and are aliased inrig.place_and_route
.rig.links
.Once this PR is merged and released, I'll submit patches for other projects.