Closed advikkabra closed 6 months ago
Advik, I recognize and appreciate the effort you have put in to find and resolve this issue. I fear, we have increased the time instead.
string1: str = "a"
string2: str = string1 * 10**6
(base) saurabh-kumar@Awadh:~/Projects/System/lpython$ time ./src/bin/lpython ./examples/example.py
^C
real 1m43.633s
user 1m43.501s
sys 0m0.048s
The operation took around 30s before. I request you to look into it.
I think you might have mistakenly tested the above using CPython.
It works fine on my machine, and after profiling more than 99% of the time used was in the strlen
calls in the strcpy
function.
advik@advik-Inspiron-7560:~/dev/oss/lpython$ time src/bin/lpython ../test.py
1000000
real 0m0.082s
user 0m0.058s
sys 0m0.020s
advik@advik-Inspiron-7560:~/dev/oss/lpython$ cat ../test.py
string1: str = "a";
string2: str = string1 * 10**6
print(len(string2))
Great work Advik! It works properly on my machine now too! That's an outright 300x improvement! :clap:
(lp) saurabh-kumar@Awadh:~/Projects/System/lpython$ time ./src/bin/lpython ./examples/example.py
real 0m0.110s
user 0m0.071s
sys 0m0.039s
The reason it did not work the previous time was because I was on an older branch. The new changes from LFortran have done an amazing job with increasing the speed.
It does not fail when s_len == 1, but memset is very optimized for this usecase.
Did you trying checking this if the else
works for s_len == 1
? It is fine to have a separate case for s_len == 1
, but I just want to be sure that the else works perfectly (for example it works for odd numbers like 1
).
Let's merge this after we are sure that the else
case works for odd numbers like 1
.
This test is already there in test_str_01.py
in integration_tests.
def test_str_repeat():
a: str
a = "Xyz"
assert a*3 == "XyzXyzXyz"
assert a*2*3 == "XyzXyzXyzXyzXyzXyz"
assert 3*a*3 == "XyzXyzXyzXyzXyzXyzXyzXyzXyz"
assert a*-1 == ""
This test is already there in test_str_01.py in integration_tests.
Do we have a test with a large n
like (10^6 or so)? If so, then it is good.
I think we should add a test for this with a large `n` (so that we are sure that the time taken does not break in future). We can do it in a separate PR.
Originally posted by @Shaikh-Ubaid in https://github.com/lcompilers/lpython/pull/2634#pullrequestreview-1975689401
How would you suggest I add this as a test? Is there a way to check if something is taking too long?
How would you suggest I add this as a test? Is there a way to check if something is taking too long?
I think not yet in LPython (for LFortran we have cpu_time
). For now, by adding a test we are just trying to ensure that the test is not taking too long.
Previously (as of the main
branch) the time complexity of _lfortran_strcpy()
was O(n^2)
. That means, for n = 10^6
the time taken could be in orders of n^12
, which would never practically complete at the CI.
If we add a test for n = 10^6
and the test completes in a finite time (I am assuming it will take < 1s
. If it take more than 1s
then it is not much helpful and we should reduce n
), then it verifies/ensures that the time complexity of _lfortran_strcpy()
has improved.
For a robust test, we should support something like time.time()
. But it should not be part of this PR and can be done later.
Once we have support for time.time()
, we can later come back to the above n = 10^6
test case and update it to add a check for the time taken.
I have added a test for a 10**6 repetition. This would take >1 minute in the previous code, so it would be easy to catch if it goes wrong. Right now it completes it in 20ms on my machine, so it is comfortable with the new implementation. I think the PR is ready now.
Fixes #2616
memset
could be used for single char strings, and binary exponentiation could be used for multi char strings. This is faster than copying one byte at a time._lfortran_strcpy
was callingstrlen
in every loop iteration, which was slowing down the program significantly.After this change, a 1e6 repetition as shown in the issue is 0.04s, which is comparable to CPython.