project-rig / rig

A collection of tools for developing SpiNNaker applications.
GNU General Public License v2.0
4 stars 0 forks source link

Only use idle cores #206

Closed mossblaser closed 8 years ago

mossblaser commented 8 years ago

This PR began as an answer to #202 to attempt to automatically avoid using non-idle cores and has resulted in a few wider-scoped changes to make this process cleaner.

Once this PR is merged and released, I'll submit patches for other projects.

mossblaser commented 8 years ago

OK Since this PR is fully backward compatible with older versions of Rig I'd suggest merging it in and I'll work on PRs for other projects in due-course since this PR won't break anything.

mundya commented 8 years ago

I still need to read the tests... sorry

mossblaser commented 8 years ago

Thanks very much for all the reviewing, take your time! I'll get on the corrections...

I've been thinking more about all this and I've come to the conclusion some things about the API coming out of this PR leave me feeling quite uncomfortable:

With the API exposed as it is here you cannot trivially answer the question "how many idle cores are there in total" without doing your own get_chip_info which is wasteful and doesn't feel right. As a result I think the _get_all_chips_info method should be made public. It should probably get a better name too...

I do not like the fact that get_resource_constraints, a MachineController method, is returning place-and-route constraints, a construct designed for a very specific application and with little general use outside that context. Additionally, if you already have a ChipInfo for every chip there is no reason for get_resource_constraints to communicate with the machine and thus no reason for it to be part of MachineController. Instead I believe this should live with the other place-and-route utility functions.

Perhaps more controversially get_machine is essentially exactly the same in that once given a ChipInfo it does not actually need to communicate with the machine at all. Though the Machine object is carefully designed to actually be quite useful outside of place-and-route, it still has plenty of quirks related to the place-and-route infrastructure (e.g. the concept of 'resources' and the Cores, SDRAM and SRAM sentinels). All this points towards get_machine being pulled out and put into a place-and-route utility function and even the Machine object being pulled into the P&R namespace. Obviously a deprecation-flagged wrapper in MachineController must remain along with aliases for the old type name for backward compatibility.

Sadly going from the Machine object to a bare dictionary of ChipInfos feels like a regression in high-level functionality in many ways. To relieve some of this I'd propose making get_all_chips_info return a dict-like object with extra functionality beyond the usual lookup from chip to ChipInfo. The API should hopefully recreate the useful functions of the Machine object, namely easy iteration over (live) chip and link coordinates, easy testing for chip and link liveness, directly getting the width/height of the network. Further functionality which wouldn't have been appropriate for Machine objects would be things like iterating over core coordinates, or even cores in a particular state.

So, what do you think?

mossblaser commented 8 years ago

OK; following the above, the following things have happened:

I think some reorganisation still needs to happen with the Rig module hierarchy:

mossblaser commented 8 years ago

OK, I think this PR is now finally complete and ready for final review and merging! The comment at the top of the PR should hopefully enumerate the changes made. Sadly due to the light internal reorganisation this PR has touched more code than it would otherwise need to.

mundya commented 8 years ago

My read through the tests was cursory, but this looks mostly good to me. I am concerned that on big machines it'll generate huge number of resource constraints, but I may just be misunderstanding that bit of code.

mossblaser commented 8 years ago

I am concerned that on big machines it'll generate huge number of resource constraints, but I may just be misunderstanding that bit of code.

See previous comment.

TODOs for me before I merge this then are:

mundya commented 8 years ago

Enable generation of global constraints when possible

Would an option be allowing constraints to be associated with multiple locations? - That way you could just modify one set of constraints and then mark the constraint as global (remove the location) if the difference between the set of locations the constraint is applied to and the overall set of locations is just the set of dead chips.

new_location = None if (old_locations - all_locations == dead_locations) else old_locations

mossblaser commented 8 years ago

Would an option be allowing constraints to be associated with multiple locations?

I don't think that would meaningfully save you compute or storage costs but would make life more complex. I'll just stick to producing a minimal set of constraints as they stand now.

mossblaser commented 8 years ago

OK; this awaits your final review. Feel free to merge if you're happy!

mundya commented 8 years ago

LGTM, I'll await the tests coming back

mundya commented 8 years ago

Are the changes to rig_par_diagram necessitated by this trivial to make?

mossblaser commented 8 years ago

Hopefully none what so ever. Has it started failing? On 3 Jan 2016 09:13, "Andrew Mundy" notifications@github.com wrote:

Are the changes to rig_par_diagram necessitated by this trivial to make?

— Reply to this email directly or view it on GitHub https://github.com/project-rig/rig/pull/206#issuecomment-168480463.

mundya commented 8 years ago

No, I'm stupid, I'd forgotten that Machine still existed. Sorry!