kurtmckee / feedparser

Parse feeds in Python
https://feedparser.readthedocs.io
Other
1.94k stars 342 forks source link

datetimes/rfc822: support non-standard format #415

Closed urain39 closed 10 months ago

urain39 commented 10 months ago
Example: 'Fri,24 Nov 2023 18:28:36 -0000'
         ~~~~^~~ No space
urain39 commented 10 months ago

Can you test it again? I don't think it is much faster than regex...

urain39 commented 10 months ago

In my tests, use re.split is about 15% slower than use str.split in 10000 times.

import time
import rfc822
import rfc822_re

date = 'Fri,24 Nov 2023 18:28:36 -0000'

def timeit(fn):
  begin = time.time()
  for _ in range(10000):
    fn(date)
  return time.time() - begin

print(f'rfc822: {timeit(rfc822._parse_date_rfc822)}')
print(f'rfc822_re: {timeit(rfc822_re._parse_date_rfc822)}')
rfc822: 0.48220133781433105
rfc822_re: 0.5581390857696533
kurtmckee commented 10 months ago

Don't worry about the macOS CI failure -- that's caused by broken symlinks from the cached tox environments. I'll delete the CI caches to resolve this problem but the root cause has to be addressed by more robust cache busting.

kurtmckee commented 10 months ago

@urain39 Thank you very much for this contribution!

@Rongronggg9 Thank you for reviewing this PR!

kurtmckee commented 10 months ago

420 made CI more resilient by using better cache-busting, and is significantly faster.