python / cpython

The Python programming language
https://www.python.org
Other
62.77k stars 30.08k forks source link

xmlrpc.client.ServerProxy needs timeout parameter #58342

Open 69774fc2-f4d0-4baa-bd53-ffa381c2a3f1 opened 12 years ago

69774fc2-f4d0-4baa-bd53-ffa381c2a3f1 commented 12 years ago
BPO 14134
Nosy @loewis, @florentx, @berkerpeksag, @demianbrecht, @mprahl
PRs
  • python/cpython#21908
  • python/cpython#21909
  • Files
  • example-of-changes.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-feature', 'library'] title = 'xmlrpc.client.ServerProxy needs timeout parameter' updated_at = user = 'https://bugs.python.org/polymorphm' ``` bugs.python.org fields: ```python activity = actor = 'mprahl' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'polymorphm' dependencies = [] files = ['24675'] hgrepos = [] issue_num = 14134 keywords = ['patch'] message_count = 21.0 messages = ['154409', '154580', '160950', '160970', '161015', '232648', '232651', '232671', '232705', '232709', '232718', '232719', '232724', '232726', '232727', '232731', '232780', '232784', '232804', '232805', '233476'] nosy_count = 8.0 nosy_names = ['loewis', 'flox', 'mcjeff', 'polymorphm', 'python-dev', 'berker.peksag', 'demian.brecht', 'mprahl'] pr_nums = ['21908', '21909'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue14134' versions = ['Python 3.5'] ```

    69774fc2-f4d0-4baa-bd53-ffa381c2a3f1 commented 12 years ago

    good day!

    "xmlrpc.client.ServerProxy()" -- not has "timeout"-parameter

    "xmlrpc.client.Transport()" and "xmlrpc.client.SafeTransport()" -- not has "timeout"-parameter too

    but "http.client.HTTPConnection()" and http.client.HTTPSConnection() -- has "timeout"-parameter

    69774fc2-f4d0-4baa-bd53-ffa381c2a3f1 commented 12 years ago

    in this subject -- I think about like this changes (see file "example-of-changes.patch")

    8c8022e7-f081-4d49-be03-926cedb53a7a commented 12 years ago

    I would think it might make more sense just to make the change to the Transport object. Since there's an argument for a transport on ServerProxy already, that seems more straightforward and keeps the network layer isolated.

    Otherwise, it seems slightly ambiguous to me. Consider that maybe I passed in a transport and a timeout, why wasn't my timeout honored? Though, I guess use_datetime already behaves that way.

    69774fc2-f4d0-4baa-bd53-ffa381c2a3f1 commented 12 years ago

    Jeff McNeil (mcjeff)> I would think it might make more sense just to make the change to the Transport object. Since there's an argument for a transport on ServerProxy already, that seems more straightforward and keeps the network layer isolated.

    in theoretical-side -- this layer isolation may be good and clean.

    but in practical-side -- situation is next:

    there are 3 alternative-variants of using timeout parameter in XMLRPC-Client:

    situation 1. programmer (who makes script or program) -- using XMLRPC-Client *WITH* timeout parameter, because timeout parameter should be using in his program. program runs in regular environment.

    situation 2. programmer (who makes script or program) -- using XMLRPC-Client *WITHOUT* timeout parameter, because XMLRPC-connection runs in localhost environment.

    situation 3. programmer (who makes script or program) -- using XMLRPC-Client *WITHOUT* timeout parameter, because he makes mistake.

    "situation 1" -- very often. (or must be very often).

    "situation 2" -- very rare.

    "situation 3" -- leads to possible cases of freezing program/script or resource-leak.

    if we will try to hide timeout parameter (in other layer), then "situation 3" will be more than "situation 1"

    # p.s.: sorry for my bad english

    8c8022e7-f081-4d49-be03-926cedb53a7a commented 12 years ago

    Yeah, that's a good point too. I still personally favor the transport encapsulation and related unit testing, but I think that's a call for someone with a snake icon next to their tracker name.

    Your English is just fine. =)

    bb8bd63d-cf82-41f3-a63e-9703d695cb16 commented 9 years ago

    I'm a -1 to adding the timeout parameter to the ServerProxy implementation for pretty much the same reasons Jeff mentioned, but mainly because of the ambiguity that is introduced between the timeout and transport parameters (who should win in the case that they're both used?). I'm also a -1, although a little less strongly, on adding the timeout parameter to the transport layer.

    Instead, what I would /like/ to see is to have a connectionfactory parameter added to Transport.\_init__ that takes a host parameter (i.e. what's passed into HTTPConnection here: https://hg.python.org/cpython/file/e301ef500178/Lib/xmlrpc/client.py#l1231) which would default to a local function or lambda that does exactly what it's doing now. There are a few benefits to doing this:

    1. Adding a timeout is just slightly less trivial than the current proposal
    2. Encapsulation of the HTTPConnection object isn't broken
    3. By exposing the ability to override the lower level HTTPConnection, connection_factory can result in a client that not only allows for setting changes such as timeouts, but can also result in any connection object that adheres to the expected HTTPConnection interface.

    For me, the 3rd point is the biggest selling feature. Not only can tweaks such as the timeout be made, but if other features are required such as customized logging, exponential back-off and such, they can easily be implemented in a child class of HTTPConnection. You could even write a (possibly) trivial adapter layer to interface with a 3rd party HTTP library if you so chose to. So, instantiating a ServerProxy object with an HTTPConnection with a custom timeout in its transport layer might look like this:

        def updated_timeout(host):
            return HTTPConnection(host, timeout=42)
    
        proxy = ServerProxy('http://example.com/gateway/', transport=Transport(connection_factory=updated_timeout))

    While the use cases that you've used as examples can tend to happen, I firmly believe that those should be solved by enhanced documentation and not by implementation changes.

    69774fc2-f4d0-4baa-bd53-ffa381c2a3f1 commented 9 years ago

    @demian.brecht , your example code-fragment is too big. :-)

    too many lines -- just only for adding "timeout". it is uncomfortably.

    most people will not using that: most likely they just will forget about "timeout" (but in *MOST* situations not using "timeout" -- it is mistake).

    bb8bd63d-cf82-41f3-a63e-9703d695cb16 commented 9 years ago

    + loewis as he's listed as the xmlrpc expert

    If you're worried about the number of lines, turn the function into a lambda:

        proxy = ServerProxy('http://example.com/gateway/', transport=Transport(
            connection_factory=lambda h: HTTPConnection(h, timeout=42)))

    I think that the problem with the way that you're looking at the problem:'just only for adding "timeout"', when what you're fundamentally after is to modify the attribute of an object two levels removed by composition.

    I /do/ agree that this is slightly more complex than simply setting a timeout parameter, but I also think that it's actually quite a bit more flexible and practically useful.

    Borrowing from PEP-20: "There should be one-- and preferably only one --obvious way to do it.". Having a timeout at the top level ServerProxy object introduces ambiguity and therefore doesn't conform. Should the connection_factory concept be used, having a timeout parameter at the Transport level also introduces ambiguity. Setting the timeout through a custom HTTPConnection instantiated through connection_factory is an obvious way to do it (especially if documented) and is marginally more code.

    If you /only/ care about the timeout and really don't want to be bothered with the connection_factory, you can always set the global socket timeout for the given request with:

        socket.setdefaulttimeout([timeout])
    69774fc2-f4d0-4baa-bd53-ffa381c2a3f1 commented 9 years ago

    @demian.brecht , socket.setdefaulttimeout([timeout]) -- it is bad practice, because setting this global varible we may spoil other cases. example "TCP keepalive" [ s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, true) ]

    and global variables is bad practice for other things.

    and again -- compare what shorter (and what more clear):

            proxy = ServerProxy('http://example.com/gateway/', transport=Transport(
                connection_factory=lambda h: HTTPConnection(h, timeout=42)))
    or
            proxy = ServerProxy('http://example.com/gateway/', timeout=42)

    There should be one-- and preferably only one --obvious way to do it.". Having a timeout at the top level ServerProxy object introduces ambiguity and therefore doesn't conform

    if you NOT point timeout in "RPC-client" -- you program will freeze or will maked resource leak (with small probability).

    "RPC-client" and "timeout" -- these two concepts are inseparable!

    you are allowed *NOT_using "timeout" in "RPC-client" -- *ONLY in *localhost* operations!

    69774fc2-f4d0-4baa-bd53-ffa381c2a3f1 commented 9 years ago

    ok, let's go to other side of this problem:

    question: why default transport (xmlrpc.client.Transport()) is not setting value of timeout?``

    answer: because *unknown* which value need to using by default.

    in various cases programmer need various timeout. default value of timeout for default transport -- what may be this number?

    may be value "300" for timeout of default transport (xmlrpc.client.Transport()) may be normal in *most_cases*. but this will brake legacy soft that using socket.setdefaulttimeout(...)

    bb8bd63d-cf82-41f3-a63e-9703d695cb16 commented 9 years ago

    socket.setdefaulttimeout([timeout]) -- it is bad practice

    I'm not really arguing this. It solves the problem, but definitely not in the best of ways. My point in referencing setdefaulttimeout is that if /all/ you care about is the timeout and you're horribly opposed to using an API as I suggested, you still have the option of using setdefaulttimeout. To be clear, it's not something that I'd advocate.

    because setting this global varible we may spoil other cases. example "TCP keepalive" [s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, true) ]

    I'm afraid you've somewhat lost me here. Calling setdefaulttimeout simply sets the value of defaulttimeout in socketmodule.c, which newly created sockets use as their default value until explicitly overridden via settimeout().

    and again -- compare what shorter (and what more clear):

    Sure, your example is clear for this /specific/ use case. To illustrate my point about ambiguity though, it's unclear what the behaviour should be here based on your patch:

        proxy = ServerProxy('http://example.com/gateway/', timeout=42, transport=Transport(2))

    Should the value in the Transport instance be used or the one in the ServerProxy parameter list? It's not obvious and therefore not a sound decision from an API design standpoint.

    Ignoring my suggestion of a connection_factory altogether, limiting the timeout to the Transport class is a little more sane and doesn't risk ambiguity:

        proxy = ServerProxy('http://example.com/gateway/', transport=Transport(timeout=2))

    Should my suggestion be tossed out, I do think that this is the most sane path to go down.

    Now, re-introducing my thoughts on a connection factory (which I still believe is /much/ more flexible and extensible than simply passing in a timeout parameter for the reasons I previously mentioned), should the timeout parameter still be accepted at the Transport level, you run into the same level of ambiguity:

        transport = Transport(timeout=2, connection_factory=lambda h: HTTPConnection(h, timeout=42))

    So now the argument in my mind becomes: Should the connection attributes be assignable through Transport instantiation or should the user have accessibility to create their own instance of a connection class that adheres to the expected API and pass that in? To me, the flexibility and benefits of the latter far outweighs the additional complexity:

        transport = Transport(timeout=2)
        transport = Transport(connection_factory=lambda h: HTTPConnection(h, timeout=42))

    if you NOT point timeout in "RPC-client" -- you program will freeze or will maked resource leak (with small probability).

    Assuming a lack of concurrency, your program will indeed freeze until the system timeout has been reached. I'm not sure about a leak. If you have an example of such a case, it would likely be a good candidate for a new issue.

    answer: because *unknown* which value need to using by default.

    No, the default should be set to socket._GLOBAL_DEFAULT_TIMEOUT (the same default used by HTTPConnection). When unset, the timeout on the socket is unspecified resulting in a value of -1, which then defaults to the system-specific timeout.

    Hopefully I've clarified things a little better. Of course, that doesn't mean that you'll agree, in which case we'll just have to agree to disagree :)

    69774fc2-f4d0-4baa-bd53-ffa381c2a3f1 commented 9 years ago

    > if you NOT point timeout in "RPC-client" -- you program will freeze or will maked resource leak (with small probability).

    Assuming a lack of concurrency, your program will indeed freeze until the system timeout has been reached. I'm not sure about a leak. If you have an example of such a case, it would likely be a good candidate for a new issue.

    I do not know how behavior in Microsoft Windows...

    in GNU/Linux "system timeout has been reached" -- means that system timeout will *never* reached.

    you may easy to test this (to make this test -- we need using: "client-computer" and "network-router-computer"):

    step 1. run next code on "client-computer" (GNU/Linux):

        $ python3
        >>> from xmlrpc.client import ServerProxy
        >>> server = ServerProxy("http://betty.userland.com")
        >>> for _ in range(100): print(server.examples.getStateName(41))

    step 2: to broke network in "network-router-computer".

    step 3: wait some minutes. example 60 minutes.

    step 4: to repear netework in "network-router-computer".

    step 5: we will see, that program on "client-computer" will freeze *forever. system timeout will *never reached. even after some days -- system timeout will not reached. :-)

    it would likely be a good candidate for a new issue.

    yes, may be we need new issue-ticket?

    bb8bd63d-cf82-41f3-a63e-9703d695cb16 commented 9 years ago

    in GNU/Linux "system timeout has been reached" -- means that system timeout will *never* reached.

    That's quite likely because the system limits may be very large. For example, on my OSX box:

    --- ~ » sysctl net.inet.tcp.keepinit
    net.inet.tcp.keepinit: 75000

    According to Apple developer docs, this is in seconds. Meaning for your example to run all 100 iterations, you'd be looking at an inordinate amount of time to finish a loop that timed out at each connection attempt and deferred to system defaults for the timeout value. Not exactly "never", but far from a reasonable time frame. Of course, this can be tuned to a more reasonable limit.

    69774fc2-f4d0-4baa-bd53-ffa381c2a3f1 commented 9 years ago

    I just will write next code-fragment:

        import socket
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP)
        s.connect(('python.org', 80))
        print(
                'is my operation system using (by default) "tcpkeepalive"-algorithm for testing broken-connection? answer:',
                s.getsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE)
                )
        # answer is 0 (false) -- for all GNU/Linux

    my previous code-example has 100-iteration -- only for we could catch the right-moment when testing (and for nothing else).

    69774fc2-f4d0-4baa-bd53-ffa381c2a3f1 commented 9 years ago

    > in GNU/Linux "system timeout has been reached" -- means that system timeout will *never* reached.

    That's quite likely because the system limits may be very large.

    I tested system-timeout GNU/Linux (on various computers). I waited more then 5 days. system-timeout works on GNU/Linux -- only if was custom-set tcpkeepalive, else (by default): even after 5 days system-timeout was not reached.

    69774fc2-f4d0-4baa-bd53-ffa381c2a3f1 commented 9 years ago

    @demian.brecht , for high probably to catch *infinite_freeze* (at GNU/Linux) -- if we may will run requests of "xmlrpc.client.ServerProxy" -- parallely:

    (when running next code -- need to make some network-disconnections on "network-router-computer")

    #!/usr/bin/env python3
        import threading
        from xmlrpc.client import ServerProxy
    
        ITERATION_COUNT = 100
        THREAD_COUNT = 100
    
        def thread_func(thread_name):
            for i in range(ITERATION_COUNT):
                try:
                    server = ServerProxy("http://betty.userland.com")
                    rpc_result = server.examples.getStateName(41)
                    print('{}/iter_{} {!r}'.format(thread_name, i, rpc_result))
                except Exception as e:
                    print('{}/iter_{} error: {} {}'.format(thread_name, i, type(e), str(e)))
    
        def main():
            print('***** testing begin *****')
    
            thread_list = []
    
            for i in range(THREAD_COUNT):
                thread_name = 'thread_{}'.format(i)
                thread_list.append(threading.Thread(target=thread_func, args=(thread_name,)))
    
            for thr in thread_list:
                thr.start()
    
            for thr in thread_list:
                thr.join()
    
            print('***** testing end *****')
    
        if __name__ == '__main__':
            main()
    bb8bd63d-cf82-41f3-a63e-9703d695cb16 commented 9 years ago

    I think we've started to venture into system-level territory that the standard library itself shouldn't have to account for. If TCP on systems are configured by default to allow for infinite timeouts, then it should likely be an issue for those distros. I don't think accounting for such things should be the responsibility of the standard library. The socket module behaves appropriately in handing off timeouts to be managed by the system level if left undefined. I think that any further discussion around timeouts should be taken up on distro mailing lists, or even perhaps python-list as it's not directly relevant to this issue.

    As for this specific issue, unless someone beats me to it, I'll get a patch together introducing the session_factory as soon as I can.

    bb8bd63d-cf82-41f3-a63e-9703d695cb16 commented 9 years ago

    On another note, running a simple test with against a non-routable IP yields that OSX's default timeout is 75 seconds and not 7500 seconds as the developer docs lead me to believe.

    bb8bd63d-cf82-41f3-a63e-9703d695cb16 commented 9 years ago

    I've attached a test-less patch with my suggested approach. If there's no opposition to this change, I'll put some work into getting tests done for it as well.

    69774fc2-f4d0-4baa-bd53-ffa381c2a3f1 commented 9 years ago

    good patch (bpo-14134.patch) ! thanks!

    bb8bd63d-cf82-41f3-a63e-9703d695cb16 commented 9 years ago

    I withdraw my patch as (I just discovered), it is already possible to effect changes to the underlying connection. What /should/ be done is:

    transport = Transport()
    con = transport.make_connection([host])
    con.timeout = 2
    
    proxy = ServerProxy([url], transport=transport)

    As the connection will not be created until an RPC method is called, the timeout value assigned to the connection will be observed on socket creation. Making such modifications to the underlying transport should be documented.

    That said, what is a little less than optimal is that the transport attribute of a ServerProxy object is inaccessible (without hackery) after instantiation, so socket level changes are impossible if not using a custom transport. What would be nice would be to add an accessor for ServerProxy.__transport. Likewise, Transport._connection is marked as part of the private interface. Adding public accessors would allow for something like this:

    proxy = ServerProxy([url])
    # pre-connect timeout
    proxy.transport.connection.timeout = 2
    proxy.system.listMethods()

    or

    proxy = ServerProxy([url])
    proxy.system.listMethods()
    # socket is available as connection has been established
    proxy.transport.connection.sock.settimeout(2)