skruger / Surrogate

Proxy server written in erlang. Supports reverse proxy load balancing and forward proxy with http (including CONNECT), socks4, socks5, and transparent proxy modes.
67 stars 14 forks source link

testing filter_host and getting transaction abort. #2

Closed ljackson closed 13 years ago

ljackson commented 13 years ago

See my fork last commit with test script an create_tables module. I am only working with forward proxy as that is all I have need for right now.

skruger commented 13 years ago

I was just looking at your commit. I just need to rewrite the filter hosts import. I can't claim that I have done a particularly good job with my file reading, let alone my function naming. To me filter_host is a reference design of how to implement a module with my filter_check behaviour. What are your thoughts on how you would like it to operate? Also, take note that filtering is optional.

Shaun

skruger commented 13 years ago

The good news is that I'm getting the same error. The bad news is that I don't know why. I'll keep at it.

ljackson commented 13 years ago

The filter hosts import seems fine to be it appears to do what i belive you were intending, I think it is something with the nested transaction or transaction including the wipe of the mnesia table.

As far as implementation I will eventually need plug-gable request filter and response filtering where the eventual code will need to call multiples of the filters at the same time and get a cumulative decision or take what it has before x timeout....etc.

Do you see response filtering possible in the current architecture?

skruger commented 13 years ago

mnesia:clear_table() can not be called within a transaction. See my code for my changes. I did merge your fork so we should be on the same version.

That was a good catch. I never noticed that it wasn't reloading the files properly. I assumed it was working because the mnesia table entries were still there each time I tested it.

skruger commented 13 years ago

I do think that response filtering is a possibility. I have been pondering a system of hooks that will allow processing and modification of the stream at various times. Most of what I have been thinking has been on the request side.

If you want to think about response filtering then I think we need to look at proxy_pass.erl (gen_fsm).

proxy_pass assumes you already have a client socket and a set of headers received from said client. After creating a proxy_pass instance by calling proxy_pass:start(#proxypass{...}) it waits in proxy_start state. I send it an event which includes the client socket, or the client socket and a host to connect to (reverse_proxy mode). The connection is opened to the real server and we enter the client_send state. Everything that was received with the headers is sent and anything else up until content-length is read from the client and then sent. At the end of this we go to server_start_recv state which gets the headers and possibly some of the response body. This is the point where we could hand off response processing to a module that does filtering or add some hooks. The one thing that it doesn't do yet is store the whole response before sending it on to the client. This is something we would need to do in order to filter on response body.

Perhaps I could provide an interface with a module behaviour that will allow you to implement a function that you pass all of the response text through one block at a time. Anything that is returned gets passed to the client. The module then has the option of not delivering anything until the end. At the end another call can be made to tell the module to flush anything it didn't already send. This way the module can change the response after receiving all of the data.

Does this sound at all interesting? I know some of it will be easier if I make a few more changes first, but I want to see what you think of it as a direction to go in.

ljackson commented 13 years ago

Yes this sounds like what it will need.

A few notes to ponder....

The proxy cannot assume that content length is valid at any time however maybe it could be used as a hint.

If we can support content chunking fort the response filtering we would be sent in that each chuncking would be added to the whole request and sent to the response filter system.

If we can go ahead and support http 1/1 we will be in a position way ahead of the other systems out there to parse and include the http streaming. This doesn't feel easy to do with your current framework but it would be amazing.

The concept of having the response subsys module take the response and be able to modify also should include the request subsystem to he able to generate and or modify the response to the client.

Thanks for your considerations

skruger commented 13 years ago

I found that content-length is critical to proper operation. If I do not respect it I leave hanging connections and things grind to a halt. I avoided chunked encoding by forwarding requests with http/1.0 as my requested protocol mainly because it was easier. Lame reason, I know. It got me to an initial working version though.

I'm fairly confident that I can create a special set of states for handling chunked encoding when detected. I could also do as you suggest and transfer control over to a response module that can read the response and relay the data to the client.

Before doing any of this though, I feel like I need to make a module to abstract the difference between ssl and gen_tcp sockets. Right now I'm doing some interesting things in proxy_pass to deal with the fact that I have clients using both socket types. It will be problematic long term. I think I'm going to try and address that tomorrow and perhaps that will help simplify things with implementing chunked encoding and other filters.