python / cpython

The Python programming language
https://www.python.org
Other
62.7k stars 30.06k forks source link

Use getaddrinfo() in urllib2.py for IPv6 support #44672

Open 4b3f5a83-4ce9-4a50-bae6-abef890b8f44 opened 17 years ago

4b3f5a83-4ce9-4a50-bae6-abef890b8f44 commented 17 years ago
BPO 1675455
Nosy @facundobatista, @orsenthil, @ned-deily
Files
  • urllib2-getaddrinfo.patch: Patch for Lib/urllib2.py replacing gethostbyname() calls with getaddrinfo() calls
  • urllib2-getaddrinfo.patch
  • test_urllib2-getaddrinfo.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/orsenthil' closed_at = None created_at = labels = ['type-feature', 'library'] title = 'Use getaddrinfo() in urllib2.py for IPv6 support' updated_at = user = 'https://bugs.python.org/dcantrell-rh' ``` bugs.python.org fields: ```python activity = actor = 'Ramchandra Apte' assignee = 'orsenthil' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'dcantrell-rh' dependencies = [] files = ['7833', '8464', '8465'] hgrepos = [] issue_num = 1675455 keywords = ['patch'] message_count = 12.0 messages = ['52082', '52083', '56125', '69209', '78716', '78717', '78750', '78754', '78759', '84810', '116619', '165931'] nosy_count = 7.0 nosy_names = ['facundobatista', 'jjlee', 'orsenthil', 'dcantrell-rh', 'dmorr', 'ned.deily', 'Ramchandra Apte'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue1675455' versions = ['Python 3.2'] ```

    4b3f5a83-4ce9-4a50-bae6-abef890b8f44 commented 17 years ago

    A number of base Python modules use gethostbyname() when they should be using getaddrinfo(). The big limitation hit when using gethostbyname() is the lack of IPv6 support.

    This first patch is for urllib2.py. It replaces all uses of gethostbyname() with getaddrinfo() instead. getaddrinfo() returns a 5-tuple, so additional code needs to wrap a getaddrinfo() call when replacing gethostbyname() calls. Still should be pretty simple to read.

    I'd like to see this patch added to the next stable release of Python, if at all possible. I am working up patches for the other modules I see in the Lib/ subdirectory that could use getaddrinfo() instead of gethostbyname().

    f4b234af-798f-43ea-a7ef-0a8972e5a2cb commented 17 years ago
    orsenthil commented 17 years ago

    Hi, The patch attached required a complete rewrite. I am attaching the modified patch, which will just substitute socket.gethostbyname with a function gethost_addrinfo which internally uses getaddrinfo and takes care of the IPv4 or IPv6 addresses translation.

    jjlee, skip: let me know your comments on this.

    One note we have to keep in mind is, testing on IPv6 address. For eg. on my system /etc/hosts 10.98.1.6 goofy.goofy.com

    fe80::219:5bff:fefd:6270 localhost

    127.0.0.1 localhost

    test_urllib2 will PASS for the above. But if I uncomment the IPv6 address, opening the local file fails. I am not sure how local file access is done with IPv6 and should urllib2 (local file opening function) itself needs to be modified. Shall check into that, with next version.

    facundobatista commented 16 years ago

    What I don't understand here is... if gethostbyname() lacks of IPv6 support, instead of creating a new function why not to add the functionality to that same function?

    Right now gethostbyname() is implemented in C, which would be the drawback of making it a Python function?

    02595635-67bd-49b1-bb7e-0decc940a690 commented 15 years ago

    Senthil,

    I don't think your gethost_addrinfo() function will work. On a v6- enabled machine, it will only return v6 or v4 names. Shouldn't it return both (since a machine could have both v4 and v6 addresses)? For example, on my machine, I have the following addresses for "localhost": ::1, fe80::1%lo0, 127.0.0.1.

    Also, why is the AI_CANONNAME flag set? The canonname field isn't used. And you only appear to take the last IP address returned (sa[0]). Shouldn't you return all the addresses?

    02595635-67bd-49b1-bb7e-0decc940a690 commented 15 years ago

    Question: Why does FTPHandler.ftp_open() try to resolve the hostname()? The hostname will be passed into connect_ftp(), then into urllib.ftpwrapper(), and eventually into ftplib.FTP.connect(), which is IPv6-aware.

    orsenthil commented 15 years ago

    Derek,

    This patch was along the lines that when IPv6 address is present, return the first address,which I assumed to be active address and would make the urllib2 work.

    I am not sure, if returning all the addresses would help and how would we define which address to use?

    AI_CANONNAME flag, I don't accurately remember it now. But I had encountered issues when testing on IPv-4 systems without it.

    I am having different opinion on this issue now.

    First is, taking from Facundo's comment on having this functionality in gethostbyname() and implementing it in C.

    Second is, the wrapper function and suitable way needs to be defined.

    I am sorry, I fail to understand the question on why ftp_open does hostname resolution. You mean to say without it, if we pass it to ftplib.FTP.connect() it would work for IPv6 address?

    02595635-67bd-49b1-bb7e-0decc940a690 commented 15 years ago

    My understanding is that the FileHandler checks if the file:// URL contains the hostname or localhost IP of the local machine (isn't that what FileHandler.names is for?). So, shouldn't the following URLs all open the same file:

    file:///foo.txt file://localhost/foo.txt file://127.0.0.1/foo.txt file://[::1]/foo.txt

    If that is the case, then doesn't FileHandler.names need to have all of those values in it?

    I am a little confused by this though. It looks like FileHandler.file_open() checks if there is a hostname in the URL, and if so, uses FTPHandler instead. So why does FileHandler.open_local_file check the hostname value?

    For your other points, gethostbyname() in libc can only handle IPv4 addresses. The IETF defined the getaddrinfo() interface as an IP version neutral replacement. I would recommend using getaddrinfo().

    Yes, FTPHandler creates an urllib.FTPWrapper object. That object calls into ftplib, which is already IPv6-capable. So, I don't think we need to do hostname resolution in FTPHandler.

    orsenthil commented 15 years ago

    I am a little confused by this though. It looks like FileHandler.file_open() checks if there is a hostname in the URL, and if so, uses FTPHandler instead. So why does FileHandler.open_local_file check the hostname value?

    You are right. Even I had observed this, but did not dispute it. Let me try to look into the history to see why it so. Perhaps it needs to change.

    For your other points, gethostbyname() in libc can only handle IPv4 addresses. The IETF defined the getaddrinfo() interface as an IP version neutral replacement. I would recommend using getaddrinfo(). Yes, FTPHandler creates an urllib.FTPWrapper object. That object calls into ftplib, which is already IPv6-capable. So, I don't think we need to do hostname resolution in FTPHandler.

    Thanks for the info. I shall look into both in revision of the path. 1) using getaddrinfo() for IP version neutral call. 2) passing the hostname directly to ftplib. ( I am not sure of consequences, need to investigate).

    ned-deily commented 15 years ago

    Note also bpo-5625 - any work for IPv6 should keep in mind that local hosts may have more than one IP address.

    83d2e70e-e599-4a04-b820-3814bbdb9bef commented 14 years ago

    @Senthil should this be assigned to your good self?

    918f67d7-4fec-4a8d-93e3-6530aeb1e57e commented 12 years ago

    Bump.