zerovm / zerocloud

Swift middleware for Zerocloud
Apache License 2.0
53 stars 14 forks source link

Fix RestController 500 errors #151

Closed larsbutler closed 10 years ago

larsbutler commented 10 years ago

This problem exists in both the RestController and the ApiController.

*Controller.load_config was returning too early if the cluster_config was retrieved from memcached. Later in the controller call chain, if cluster_config is set, config_path is expected to be set as well, but the load_config call was returning prematurely before config_path could be set.

This was resulting in intermittent generic 500 errors when exercising the /api/ URL REST features. (Basically, a reqest would work the first time, then throw errors until the cache expires--about 60 seconds later--and then start working again.)

This patch fixes that and adds tests to make sure the proper instance attributes are being set.

larsbutler commented 10 years ago

Don't merge yet; I still need to add another test for ApiController.

larsbutler commented 10 years ago

This breaks some of the caching. I need to fix that as well.

larsbutler commented 10 years ago

Okay, the caching should be fixed now.

pkit commented 10 years ago

Thanks for finding the bug. Looks almost ok, but it will fall apart on some other errors. For example if the size of config object is zero, it will fail in the NOT_SATISFIABLE case. Probably we can do try:... finally: and set config_path there.

larsbutler commented 10 years ago

Hmm, it seems test coverage is missing for the latter half of ApiController.load_config: https://github.com/larsbutler/zerocloud/blob/9a4b1923c6b3e3a1f7a7756bbab6cb58f0bffdb5/zerocloud/proxyquery.py#L1835-1855

larsbutler commented 10 years ago

@pkit I just noticed something else: https://github.com/larsbutler/zerocloud/blob/9a4b1923c6b3e3a1f7a7756bbab6cb58f0bffdb5/zerocloud/proxyquery.py#L1828

Why is zerovm_maxconfig being multiplied by 2?

pkit commented 10 years ago

Because the length of the file is <= maxconfig, but not the bytes in tar, tar have headers of unknown length (depends on path length and some other things), so, "when something can be bigger - always multiply by 2, if smaller - always subtract 1", is usually a good strategy.

larsbutler commented 10 years ago

I'm confused. Why is zerovm_maxconfig being applied to something which is not a JSON job description? The tarball != the job description.

pkit commented 10 years ago

system.map is a job description and it is a first file in zapp/tar image :)

larsbutler commented 10 years ago

How can we know for sure that it's the first file?

larsbutler commented 10 years ago

(Also, please try to comment stuff like this in the future. The intention here is not at all obvious.)

pkit commented 10 years ago

This is a requirement. This is how zpm builds zapps, this is how python.tar is created and so on. We cannot allow to intermediately store any data when processing images, everything is streamed and materializes only on the final object server.

larsbutler commented 10 years ago

Okay, that's fine, but this requirement isn't documented anywhere, AFAIK.

Also, zapps are not first class citizens in ZeroCloud. We have system images, we have ZeroVM images, and we have zapps, and all of them are a little bit different.

pkit commented 10 years ago

They are a little different, but essentially the same. I.e. their meaning is different, but their format is similar (and in most cases 100% same) and the idea is to happily "chew" on any of these. If you look under the hood of Linux executables (ELF format) you will find that they are also not 100% similar and sometimes differ a lot, but still are consumed easily.

larsbutler commented 10 years ago

In general that's fine, but I don't like the fact that we have to pack runtime information (file paths, etc.) inside the executable in the boot/system.map. I would like to fix this somehow.

larsbutler commented 10 years ago

@pkit Anyway, what needs to be fixed here for this to be merge-worthy?

larsbutler commented 10 years ago

@pkit

I think instead of setting cluster_config before each return We can just wrap everything in try: finally: and assign cluster_config in finally block.

cluster_config or config_path?

pkit commented 10 years ago

Maybe both, but probably setting config_path only will be ok. Maybe there is also a better way, config path is used to fetch url for x-zerovm-source header, maybe we can just create a new method for that, and then config path does not need to be set...

larsbutler commented 10 years ago

@pkit That would be different logic from what is implemented now, I think.

Let's step back for a second: when we handle_request, what sort of things can happen? What are the error conditions? Under what conditions are config_path and cluster_config instance vars populated?

I think a lot of confusion is coming from this pattern:

error = self.load_config(req, swift_path)
if error:                                
    return error                         

We using the return val to capture errors, instead of just raising an exception.

pkit commented 10 years ago

Yes, unfortunately instance of a Response class is not an exception, so it cannot be raised. Only if the instance was created using the HTTP* class methods it will be inherited from both Response and Exception. Unfortunately not all Response instances with status_int >= 300 are Exceptions... Therefore we cannot use raise error_response semantics here, I've tried...

larsbutler commented 10 years ago

load_config sets cluster_config and config_path instance vars. For how many requests are these used? Just once, or are RestController and ApiController reused somehow? (Or for that matter, ClusterController, the superclass.)

pkit commented 10 years ago

I think cluster_config is used by ClusterController (inside self.post_job()) and config_path is used only before calling self.post_job()

larsbutler commented 10 years ago

Are the controller instances used to serve a single request, or is there some persistence between requests?

pkit commented 10 years ago

Each controller is used only once per request. Middleware and Parser objects are reused, but not Controller.

larsbutler commented 10 years ago

https://github.com/zerovm/zerocloud/pull/155 is a better fix. I'm abandoning this.