Closed GoogleCodeExporter closed 9 years ago
This seems to be an unnecessary optimization. It's unlikely to create
significant performance gains for most feeds. I recommend closing this bug.
Original comment by kurtmckee
on 4 Dec 2010 at 7:36
I'm going to leave this open for the moment.
David: do you have any performance stats about the impact of this patch?
Original comment by adewale
on 4 Dec 2010 at 11:12
This actually wasn't a performance optimization, but an attempt to fix
correctness (an only a partial one at that). We were getting a small number of
feeds where the wrong date format was detected. This change used the earlier
correct failures to prevent later issues. I can construct a sample feed for
this.
Original comment by EpsilonP...@hotmail.com
on 6 Dec 2010 at 9:51
Okay, I think I'm on the same page (and I'm going to reach a conclusion based
on that, but please let me know if I'm mistaken).
You're looking at the possibility of having dates with the day first, right? As
mentioned above, you might not know in advance if it's M-D-Y or D-M-Y, which
would give any software fits if it's given a list of dates like:
12/6/2010
4/5/2010
3/12/2010
But if you suddenly discover one of the following dates in the same feed, you'd
instantly know which format was in use:
12/31/2010 (m-d-y)
31/12/2010 (d-m-y)
If you're having to check for D-M-Y, I'm certain that you're having to modify
feedparser's built-in date parsers, or you're registering your own using
registerDateHandler, since the built-in date parsers don't appear to parse date
strings that could overlap -- I'm pretty sure they're unambiguous.
Is this something that can be fixed in your application without having to
modify feedparser to introduce what looks like a thread-unsafe global
dictionary? There's already a bug report open complaining about feedparser
having a thread-safety issue, and while I'm looking forward to tackling that
soon, I'm concerned that this patch will introduce another thread-safety
problem.
You could use a class to store state, and add a __call__() and a reset() method
to it so that, once instantiated, it can be called like a function, but can
also store state and be reset for the next time parse() is called. For instance:
class CustomDateTimeParser:
def __init__(self):
self.state = 0 # or a list of date-time parser functions
print self.state
def __call__(self, date):
self.state += 1 # or modify state only when a date-time function fails
print self.state
def reset(self):
self.state = 0 # or restore the full list of date-time functions
print self.state
Then you can run code like:
>>> a = CustomDateTimeParser() # self.state = 0
0
>>> a('mdy') # increment
1
>>> a('mdy') # increment
2
>>> a('mdy') # increment
3
>>> a.reset() # reset
0
Callable class instantiations are a popular design pattern in the Django
community from what I've read, and they work great! This would allow you to try
M-D-Y first, and switch to D-M-Y if that fails, without having to add a global
dictionary to feedparser.
Again, this is predicated on my interpretation of what you wrote above. If
everything above was correct, then I don't think that adding a global
dictionary is the right way to handle this.
Original comment by kurtmckee
on 7 Dec 2010 at 5:26
Your approach provides more correctness (correcting all the dates if at least
one can be found that was wrong). I like that. What I don't like is that
everyone doesn't get it for free. Is it acceptable to put this logic in at the
top level? Alternatively it could be provided as a wrapper (or optional
wrapper) at the top level.
I agree about thread safety - this was not the way to do it if you need that.
Original comment by EpsilonP...@hotmail.com
on 7 Dec 2010 at 1:05
Please you add some examples of feeds on the web that exhibit this bug.
Original comment by adewale
on 22 Dec 2010 at 9:51
I think that unhandled feed dates can be handled by developers, and that this
particular patch will introduce thread-safety problems.
Original comment by kurtmckee
on 26 Apr 2011 at 6:44
Original issue reported on code.google.com by
EpsilonP...@hotmail.com
on 3 Dec 2010 at 10:50Attachments: