scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.55k stars 465 forks source link

Importing dateparser taking a significant time #253

Closed nicktacular closed 4 years ago

nicktacular commented 7 years ago

We've replicated this on both OS X and Ubuntu 16.04. Importing takes around 0.5 - 0.7 seconds to import. It's unclear what the cause of this is but is a significant performance concern.

OS X

$ for i in $(seq 30); do python -c 'import time; s = time.time(); import dateparser ; print(time.time()-s)' 2> /dev/null | tail -1; done
0.692351102829
0.694715023041
0.675757169724
0.690068006516
0.692388057709
0.691253900528
0.677401065826
0.707463026047
0.680037021637
0.727914094925
0.696306943893
0.698905944824
0.689168214798
0.709505796432
0.685596942902
0.711586952209
0.682478904724
0.670440912247
0.698837995529
0.685151100159
0.682590961456
0.695116996765
0.690932035446
0.718989133835
0.697949886322
0.732755184174
0.803722143173
0.836079120636
0.805291891098
0.740805149078
$ python --version
Python 2.7.10
$ pip freeze
appnope==0.1.0
backports.shutil-get-terminal-size==1.0.0
dateparser==0.5.0
decorator==4.0.10
enum34==1.1.6
ipython==5.1.0
ipython-genutils==0.1.0
jdatetime==1.8.1
pathlib2==2.1.0
pexpect==4.2.1
pickleshare==0.7.4
prompt-toolkit==1.0.9
ptyprocess==0.5.1
Pygments==2.1.3
python-dateutil==2.6.0
pytz==2016.7
regex==2016.11.21
ruamel.ordereddict==0.4.9
ruamel.yaml==0.13.1
simplegeneric==0.8.1
six==1.10.0
traitlets==4.3.1
typing==3.5.2.2
tzlocal==1.3
umalqurra==0.2
wcwidth==0.1.7
$ sysctl -n hw.ncpu
4
$

Ubuntu 16.04

$ for i in $(seq 30); do python -c 'import time; s = time.time(); import dateparser ; print(time.time()-s)' 2> /dev/null | tail -1; done

0.608508110046
0.591820955276
0.583191156387
0.595532894135
0.58847284317
0.596559762955
0.597602844238
0.596135854721
0.588882923126
0.581357955933
0.587655067444
0.599339008331
0.59353518486
0.59841299057
0.594044923782
0.589542150497
0.583790063858
0.590340137482
0.5944480896
0.590999126434
0.587160110474
0.588365793228
0.58954501152
0.589763879776
0.597408056259
0.587062120438
0.600353002548
0.588448047638
0.587937116623
0.596489906311
$ cat /proc/cpuinfo | grep processor | wc -l
16
$ python --version
Python 2.7.12
eliasdorneles commented 7 years ago

This is because dateparser reads the language configuration files at import time, that's why it takes a little while to load. However, this only happens the first time you import it.

Why is this a concern for you?

nicktacular commented 7 years ago

@eliasdorneles it's a concern because any Python script which runs frequently now runs at least 0.5 seconds slower. If the script is expected to react quickly due to doing something simple, waiting a 0.5 sec longer seems odd.

Is there a way to specify not to load any language configs that I won't be using? I'm only interested in parsing dates in English.

waqasshabbir commented 7 years ago

@nicktacular it's a valid concern and can be resolved by a combination of lazy and on-demand loading of languages data.

Is there a way to specify not to load any language configs that I won't be using? I'm only interested in parsing dates in English.

This will be possible after we implement said idea. Will push a fix soon.

nicktacular commented 7 years ago

Excellent, thank you.

asya-bergal commented 7 years ago

Any updates on this? I'd love for this to be faster.

atultherajput commented 7 years ago

@waqasshabbir I would love to work on this issue. I am a newbie so please guide me to fix this issue.

asadurski commented 6 years ago

For version 0.7.0 import time went down to less than 0.20 s. Is that within acceptable boundaries, @nicktacular?

nicktacular commented 6 years ago

@asadurski cool, I will try using this again.

Gallaecio commented 4 years ago

Shall we close this?

eliasdorneles commented 4 years ago

I vote to close the issue, yes, as I don't see a strong need to improving this much given that it happens only once at import time. Users who only need basic English date parsing and want to improve the performance can use dateutil.parser.parse directly.