Open lexelby opened 9 years ago
There are two ways this happens.
current_price
and raise an error. [1] https://github.com/klahnakoski/SpotManager/blob/dev/spot/spot_manager.py#L80
I didn't see the "who does it belong to" message, and doesn't Log.error throw an exception and bomb out?
I tried blowing away my prices file in case it was corrupted, but it rebuilt it and still had trouble. I loaded the prices file by hand and there are, indeed, valid prices for m3.large in us-east-1e. I guess maybe the processing code has a bug?
bug is possible, may you email m your prices file? I will try to replicate this problem.
doesn't Log.error throw an exception and bomb out?
Yes, but I do not know how else to handle it. There is a small delay between requesting the spot request and naming that request. I have had a few instances where SpotManager fails during that time (usually me during debugging) and does not have the chance to name the spot request. A spot request without a name is ambiguous: Does it belong to another SpotManager instance? (as happens in my case)
you can be emailed all your warnings and errors by adding to the debug.logs
array in you settings.json file:
{ "log_type": "email", "from_address": "me@example.com", "to_address": "me@example.com", "subject": "Problem with SpotManager" },
The full set of parameters can be seen in the Emailer.__init__()
[1]. Full example at [2]
[1] https://github.com/klahnakoski/SpotManager/blob/dev/pyLibrary/env/emailer.py#L26 [2] https://github.com/klahnakoski/SpotManager/blob/e07a2e8a0f72664dc5f524172b4f897052b56fa8/examples/config/es_settings.json#L119
Please see if this will fix the problem. Apply them to your branch.
https://github.com/klahnakoski/SpotManager/commit/845cd40028af359168a4af2ec47b3ea2cd803541
If it works, then it is a better solution than failing outright:
NM, I will continue working on that branch to see where other coalesce
calls are required
Here are the specific changes to the SpotManager. The cause is the lack of definition for a particular instance type, which was then infecting the summary calculation.
I ended up going a similar route by throwing a coalesce call in there, but I thought I was just patching over some other bug. In what way was m3.large/us-east-1 not defined in the pricing data I emailed you? I thought I combed through the data in ipython and saw entries for that type+zone...
I believe you have _an instancetype running on AWS that does not have an entry in the utility array. This is not necessarily m3.large, but probably some other instance type. If you have just one unknown instance type (and AZ combination), then this bug will show itself. In light of this, your previous fix of blacklisting utility entries is superior to my suggestion of just commenting out the entry: At least the pricing was still calculated for instance_types you are no longer interested in.
In an effort to be clearer: There are actually two independent problems you raised; neither affected the other. The first causes the error, and is fixed with coalescing calls. The second concerns the nulls
found in the pricing grid, and is solved here: https://github.com/klahnakoski/SpotManager/commit/fcbbc25a191afa84985d584896b6a8cfb2b6bc0e
Huh. Would you expect that the mystery instance would have a name tag that matches the prefix specified in settings? I'm sure I don't have any weird instance types with names like that, since SpotManager built my whole fleet.
Also, why is it that this mystery instance would have caused nulls in the pricing data for m3.large? I think you're suggesting that first, an active spot request for an instance of this weird type was processed, polluting current_spending with a dot-wrapped None, and then the next spot request to be processed actually caused the crash. However, through debug logging lines I saw that both spot requests were for m3.large/us-east-1e and the pricing data for that type did contain all Nulls. Does the commit you just mentioned fix that? I don't understand what you're changing, so I can't tell.
I'm getting this:
Apparently about.current_price is None, and then the second time around, current_spending is a dot-wrapped None, so I get this error.
I've added this in
prices
to see what's going on:And got this:
I'm not sure why this ended up as
Null
s.