Closed GoogleCodeExporter closed 9 years ago
@Adewale: This is a duplicate of issue 221.
@flytwokites: From the discussion in issue 221:
import socket
socket.setdefaulttimeout(<timeout in floating seconds>)
By default Python's timeout is `None`, so I recommend setting this to a more
reasonable value for your software.
Original comment by kurtmckee
on 3 Jan 2011 at 1:03
But, I need a small timeout to parse feeds, e.g. 5 seconds, and I also use the
socket lib to connect to other important service like remote ssh/smtp..., set a
global timeout too small may cause other codes that use socket lib be unstable.
Original comment by flytwoki...@gmail.com
on 3 Jan 2011 at 1:12
That makes sense, assuming that your application is running a web crawler, SMTP
client/server, and an SSH client/server all in the same interpreter instance.
If you're running all of that in separate Python interpreter instances, setting
the default timeout in one won't affect the others.
Meanwhile, I've grepped through the Python 2.4 and 2.5 standard library on my
system, scoured search engines for hints, and have found absolutely nothing for
supporting timeouts on systems other than Python 2.6 and up. In issue 221 I
wrote why I strongly disagree with adding a `timeout` argument to `parse()`,
but adding a module-level variable that doesn't actually work in 2.4 and 2.5
without causing thread-safety issues isn't an option either.
There are some other options available:
1) You can manually add a timeout argument to feedparser, in function
`_open_resource()`, at what is currently line 2878. Look for the following line
and change it to:
BEFORE: return opener.open(request)
AFTER: return opener.open(request, timeout=<whatever>)
2) If you're using 2.4 or 2.5, you have to use the code I mentioned above. If
you're not using a multi-threaded application, I expect you can set the default
timeout as I mentioned above and then change it back after calling feedparser's
`parse()` function. You could also use a separate URL-fetching library (such as
Twisted) to get the headers and data, and then run the raw data through
`parse()`.
3) Again, if you're not running a feed aggregator, SSH client, and SMTP client
all in the same program, setting the default timeout won't have any affect from
one program to the next.
Original comment by kurtmckee
on 3 Jan 2011 at 5:49
Thank you for your reply.
I using python 2.7, so I make a local copy of feedparser to add timeout
parameter.
What about add timeout to feedparser, and note in document that it needs python
>= 2.6 otherwise will raise an exception.
Original comment by flytwoki...@gmail.com
on 3 Jan 2011 at 8:44
I'm marking this as a duplicate of issue 221.
Feedparser's code already has too many version-specific parts. I don't believe
that timeout is important enough to justify adding another one. Moreover it
would be horribly mis-leading to people using earlier versions of Python to
suggest (via the parser() signature that we support timeout when we don't.
However I'd change my mind if you came up with a design based on wrapping the
existing code with something which explicitly said that it only supports Python
>= 2.6 and is thread-safe.
Original comment by adewale
on 3 Jan 2011 at 8:00
> I don't believe that timeout is important enough
If you'd like the module to be considered a toy not suitable for serious work I
agree. As mentioned before, setting it globally is not an answer for many.
> Feedparser's code already has too many version-specific parts.
Unfortunately, that's the reality of supporting multiple versions of python.
If it's becoming unbearable then perhaps it's time to drop < 2.6 instead.
> would be horribly mis-leading to people using earlier versions
So, the potential of misleading the minority (on < 2.6) is more important than
fixing an important issue for the majority? As long as they get a short
explanation in the docs and a useful error that should be plenty.
They won't lose anything but a minute of potential disappointment however the
masses using 2.6+ will gain an important capability.
I might understand your position if it broke things for earlier versions but
since it doesn't find it hard to justify.
Original comment by geekad...@mgmiller.net
on 22 Jan 2012 at 8:36
Hi Mike, thank you for contributing to the discussion. There are a few points
I'd like to address.
First, feedparser isn't an HTTP client library. I expect that you and the other
four developers who've written comments in the two related bug reports will
find that a true HTTP client library will be better suited for larger, more
comprehensive projects.
Second, feedparser isn't an HTTP client library. The version-specific parts
that Ade was referring to have to do primarily with changes in the XML
libraries and sgmllib, and are there so that nobody experiences a loss of
functionality. It's not unbearable at all!
Third, feedparser isn't an HTTP client library. Its `parse()` function will not
include a `timeout` argument.
Original comment by kurtmckee
on 22 Jan 2012 at 9:10
[deleted comment]
Sincerely intended... if you won't do it right, why do it at all?
A parse function shouldn't take an url as argument, yet it does. To run 98% of
the way to the finish line, yet refuse to cross only frustrates. If it's not
an http client library, better off removing that functionality than misleading
with a toy implementation that increases complexity.
Original comment by geekad...@mgmiller.net
on 23 Jan 2012 at 5:09
Original issue reported on code.google.com by
flytwoki...@gmail.com
on 2 Jan 2011 at 4:07