python / cpython

The Python programming language
https://www.python.org
Other
62.51k stars 30.01k forks source link

Enhancement: relax os.path.common input requirements #99961

Open Kentzo opened 1 year ago

Kentzo commented 1 year ago

Feature or enhancement

os.path.commonpath is needlessly strict requiring its paths argument to be a Sequence. The underlying algorithm does not require it and the restriction is an implementation artifact.

Pitch

  1. Faster code: current implementation goes over its input while, in principle, one iteration is sufficient
  2. Relaxed argument requirement: Sequence is replaced with Iterable

Previous discussion

https://discuss.python.org/t/os-path-commonpath-doesnt-need-a-sequence/21543/13

Linked PRs

serhiy-storchaka commented 1 year ago

Please show the results of microbenchmarks which you used to ensure that the new implementation is faster.

Kentzo commented 1 year ago

Benchmark results:

ntpath

                EXEC    PEAK MEM
0 common parts:
    worst case  -6.8%   -99.6%
    best case   -6.7%   -99.6%
1 common parts:
    worst case  -6.4%   -99.6%
    best case   -6.5%   -99.6%
2 common parts:
    worst case  -6.6%   -99.6%
    best case   -6.8%   -99.6%
3 common parts:
    worst case  -6.6%   -99.6%
    best case   -6.5%   -99.6%
4 common parts:
    worst case  -6.3%   -99.6%
    best case   -6.4%   -99.6%
5 common parts:
    worst case  -5.5%   -99.6%
    best case   -5.8%   -99.6%
6 common parts:
    worst case  -5.5%   -99.6%
    best case   -5.9%   -99.6%
7 common parts:
    worst case  -5.4%   -99.6%
    best case   -5.8%   -99.6%
8 common parts:
    worst case  -5.1%   -99.6%
    best case   -5.0%   -99.6%
9 common parts:
    worst case  -4.7%   -99.6%
    best case   -4.2%   -99.6%
10 common parts:
    worst case  -4.5%   -99.6%
    best case   -3.9%   -99.6%
11 common parts:
    worst case  -4.6%   -99.6%
    best case   -4.2%   -99.6%
12 common parts:
    worst case  -4.6%   -99.6%
    best case   -4.3%   -99.6%
13 common parts:
    worst case  -3.4%   -99.6%
    best case   -3.1%   -99.6%
14 common parts:
    worst case  -3.9%   -99.6%
    best case   -3.7%   -99.6%
15 common parts:
    worst case  -3.7%   -99.6%
    best case   -3.4%   -99.6%
16 common parts:
    worst case  -3.4%   -99.7%
    best case   -3.3%   -99.7%  

posixpath

                EXEC    PEAK MEM
0 common parts:
    worst case  -9.4%   -99.7%
    best case   -9.1%   -99.7%
1 common parts:
    worst case  -8.4%   -99.7%
    best case   -8.9%   -99.7%
2 common parts:
    worst case  -8.1%   -99.7%
    best case   -8.8%   -99.7%
3 common parts:
    worst case  -8.2%   -99.7%
    best case   -8.7%   -99.7%
4 common parts:
    worst case  -8.4%   -99.7%
    best case   -9.2%   -99.7%
5 common parts:
    worst case  -6.8%   -99.6%
    best case   -7.9%   -99.6%
6 common parts:
    worst case  -6.8%   -99.6%
    best case   -7.7%   -99.6%
7 common parts:
    worst case  -6.6%   -99.6%
    best case   -7.6%   -99.6%
8 common parts:
    worst case  -5.9%   -99.7%
    best case   -6.1%   -99.7%
9 common parts:
    worst case  -5.5%   -99.6%
    best case   -5.1%   -99.6%
10 common parts:
    worst case  -5.5%   -99.6%
    best case   -4.8%   -99.6%
11 common parts:
    worst case  -5.6%   -99.6%
    best case   -5.3%   -99.6%
12 common parts:
    worst case  -5.4%   -99.7%
    best case   -4.9%   -99.7%
13 common parts:
    worst case  -5.1%   -99.6%
    best case   -4.5%   -99.6%
14 common parts:
    worst case  -4.9%   -99.6%
    best case   -4.5%   -99.6%
15 common parts:
    worst case  -4.9%   -99.6%
    best case   -5.0%   -99.6%
16 common parts:
    worst case  -4.6%   -99.7%
    best case   -4.4%   -99.7%   

Note that the peak memory of the current implementation is O(N), while new implementation is O(1). On my machine I measured that the new implementation consumes 2.8 / N of the current implementation.

encukou commented 5 months ago

@barneygale, this might be interesting to you