klahnakoski / SpotManager

Find cheapest spot instance prices, bid, use, and teardown when done
Mozilla Public License 2.0
17 stars 3 forks source link

about.current_price is sometimes None #16

Open lexelby opened 9 years ago

lexelby commented 9 years ago

I'm getting this:

WARNING: Problem with spot manager
        File "spot/spot_manager.py", line 653, in main
        File "spot/spot_manager.py", line 709, in <module>
caused by
        ERROR: 'NoneType' object does not support item assignment
        File "/home/lex/SpotManager/pyLibrary/dot/nones.py", line 61, in __iadd__
        File "spot/spot_manager.py", line 94, in update_spot_requests
        File "spot/spot_manager.py", line 648, in main

Apparently about.current_price is None, and then the second time around, current_spending is a dot-wrapped None, so I get this error.

I've added this in prices to see what's going on:

     print [thing for thing in self.prices if thing.current_price == None]

And got this:

[Dict({u'count': 24, u'all_price': [Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null, Null], u'availability_zone': u'us-east-1e', u'price_80': None, u'current_price': None, u'type': Dict({u'instance_type': u'm3.large', 'discount': 0, u'utility': 6.5}), u'max_price': Null})]

I'm not sure why this ended up as Nulls.

klahnakoski commented 9 years ago

There are two ways this happens.

[1] https://github.com/klahnakoski/SpotManager/blob/dev/spot/spot_manager.py#L80

lexelby commented 9 years ago

I didn't see the "who does it belong to" message, and doesn't Log.error throw an exception and bomb out?

I tried blowing away my prices file in case it was corrupted, but it rebuilt it and still had trouble. I loaded the prices file by hand and there are, indeed, valid prices for m3.large in us-east-1e. I guess maybe the processing code has a bug?

klahnakoski commented 9 years ago

bug is possible, may you email m your prices file? I will try to replicate this problem.

klahnakoski commented 9 years ago

doesn't Log.error throw an exception and bomb out?

Yes, but I do not know how else to handle it. There is a small delay between requesting the spot request and naming that request. I have had a few instances where SpotManager fails during that time (usually me during debugging) and does not have the chance to name the spot request. A spot request without a name is ambiguous: Does it belong to another SpotManager instance? (as happens in my case)

klahnakoski commented 9 years ago

you can be emailed all your warnings and errors by adding to the debug.logs array in you settings.json file:

{ "log_type": "email", "from_address": "me@example.com", "to_address": "me@example.com", "subject": "Problem with SpotManager" },

The full set of parameters can be seen in the Emailer.__init__() [1]. Full example at [2]

[1] https://github.com/klahnakoski/SpotManager/blob/dev/pyLibrary/env/emailer.py#L26 [2] https://github.com/klahnakoski/SpotManager/blob/e07a2e8a0f72664dc5f524172b4f897052b56fa8/examples/config/es_settings.json#L119

klahnakoski commented 9 years ago

Please see if this will fix the problem. Apply them to your branch.

https://github.com/klahnakoski/SpotManager/commit/845cd40028af359168a4af2ec47b3ea2cd803541

If it works, then it is a better solution than failing outright:

https://github.com/klahnakoski/SpotManager/commit/282abec26b414c6f83e16980731c1b8729ebaff8#diff-1738a9cd16a8df8d87ddf66c3acaf8b4R80

klahnakoski commented 9 years ago

NM, I will continue working on that branch to see where other coalesce calls are required

klahnakoski commented 9 years ago

Here are the specific changes to the SpotManager. The cause is the lack of definition for a particular instance type, which was then infecting the summary calculation.

https://github.com/klahnakoski/SpotManager/commit/1efb0b72ecd02c19e34e44820b7077de09d8fc7e#diff-1738a9cd16a8df8d87ddf66c3acaf8b4L80

lexelby commented 9 years ago

I ended up going a similar route by throwing a coalesce call in there, but I thought I was just patching over some other bug. In what way was m3.large/us-east-1 not defined in the pricing data I emailed you? I thought I combed through the data in ipython and saw entries for that type+zone...

klahnakoski commented 9 years ago

I believe you have _an instancetype running on AWS that does not have an entry in the utility array. This is not necessarily m3.large, but probably some other instance type. If you have just one unknown instance type (and AZ combination), then this bug will show itself. In light of this, your previous fix of blacklisting utility entries is superior to my suggestion of just commenting out the entry: At least the pricing was still calculated for instance_types you are no longer interested in.

klahnakoski commented 9 years ago

In an effort to be clearer: There are actually two independent problems you raised; neither affected the other. The first causes the error, and is fixed with coalescing calls. The second concerns the nulls found in the pricing grid, and is solved here: https://github.com/klahnakoski/SpotManager/commit/fcbbc25a191afa84985d584896b6a8cfb2b6bc0e

lexelby commented 9 years ago

Huh. Would you expect that the mystery instance would have a name tag that matches the prefix specified in settings? I'm sure I don't have any weird instance types with names like that, since SpotManager built my whole fleet.

Also, why is it that this mystery instance would have caused nulls in the pricing data for m3.large? I think you're suggesting that first, an active spot request for an instance of this weird type was processed, polluting current_spending with a dot-wrapped None, and then the next spot request to be processed actually caused the crash. However, through debug logging lines I saw that both spot requests were for m3.large/us-east-1e and the pricing data for that type did contain all Nulls. Does the commit you just mentioned fix that? I don't understand what you're changing, so I can't tell.