hjjhh6 commented 11 months ago

7 孙老师你好，经过尝试，具体可见model.py的修改，非代码相关github不熟练望理解

hjjhh6 commented 11 months ago

该部分主要是将multi.processing修改为joblib，解决了并行化反而变慢的问题，相较不并行化耗时为原来的33.2%，需要注意的是，线程锁依然是最耗时的原因，如果解决线程锁的问题将会在并行处理的基础上使得大量数据的计算成为可能

以下为大于1s的耗时任务，其中95.5%耗时集中在joblib的并行处理中，的MGTWR中主要耗时任务为cal_aic，但耗时仅为1.293s，获取线程锁的0.0068。

time cost: 0:03:40.609 12395230 function calls (12381086 primitive calls) in 220.610 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 902982 187.731 0.000 187.731 0.000 {method 'acquire' of '_thread.lock' objects} 1245 22.471 0.018 215.907 0.173 d:\anaconda\Lib\site-packages\joblib\parallel.py:960(retrieve) 403539 2.014 0.000 192.678 0.000 d:\anaconda\Lib\concurrent\futures_base.py:428(result) 1220 1.293 0.001 215.684 0.177 C:\Users\34456\AppData\Roaming\Python\Python311\site-packages\mgtwr\model.py:450(cal_aic) 395462 1.281 0.000 189.926 0.000 d:\anaconda\Lib\threading.py:288(wait)

sunkun1997 commented 11 months ago

我用的进程并行处理，也会有线程锁吗？

hjjhh6 commented 11 months ago

我用的进程并行处理，也会有线程锁吗？

有的，不确定是哪一个位置出现的这个问题。以下是1.01的代码运行后的报告

Original(thread=15) time cost: 0:24:29.669 6022657 function calls (6019718 primitive calls) in 1469.671 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 23663 1397.118 0.059 1397.118 0.059 {method 'acquire' of '_thread.lock' objects}

这是询问copilot的回复

在Python的multiprocessing库中，尽管它是基于进程的并行处理，但在其内部实现中，仍然会使用线程和线程锁。这主要是因为multiprocessing.Pool在处理任务结果的收集和分发时，会使用一个内部的线程。

当你在主进程中创建一个multiprocessing.Pool时，Python会为每个子进程创建一个线程，用于监听和处理来自子进程的结果。这个线程会使用一个线程锁来同步数据的接收和处理，这就是你在性能分析中看到的{method 'acquire' of '_thread.lock' objects}。

如果你的任务主要是IO密集型的，那么这个线程锁可能不会成为性能瓶颈。但是，如果你的任务主要是CPU密集型的，那么这个线程锁可能会导致性能下降，因为它会阻塞主进程，使得主进程无法立即处理新的任务结果。

为了避免这个问题，你可以尝试使用multiprocessing.Pool.imap或multiprocessing.Pool.imap_unordered，这两个方法返回一个迭代器，可以在计算结果时立即处理，而不需要等待所有任务完成。

hjjhh6 commented 11 months ago

我用的进程并行处理，也会有线程锁吗？

有的，不确定是哪一个位置出现的这个问题。以下是1.01的代码运行后的报告 2. Original(thread=15) time cost: 0:24:29.669 6022657 function calls (6019718 primitive calls) in 1469.671 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 23663 1397.118 0.059 1397.118 0.059 {method 'acquire' of '_thread.lock' objects}

这是询问copilot的回复

在Python的multiprocessing库中，尽管它是基于进程的并行处理，但在其内部实现中，仍然会使用线程和线程锁。这主要是因为multiprocessing.Pool在处理任务结果的收集和分发时，会使用一个内部的线程。

当你在主进程中创建一个multiprocessing.Pool时，Python会为每个子进程创建一个线程，用于监听和处理来自子进程的结果。这个线程会使用一个线程锁来同步数据的接收和处理，这就是你在性能分析中看到的{method 'acquire' of '_thread.lock' objects}。

如果你的任务主要是IO密集型的，那么这个线程锁可能不会成为性能瓶颈。但是，如果你的任务主要是CPU密集型的，那么这个线程锁可能会导致性能下降，因为它会阻塞主进程，使得主进程无法立即处理新的任务结果。

为了避免这个问题，你可以尝试使用multiprocessing.Pool.imap或multiprocessing.Pool.imap_unordered，这两个方法返回一个迭代器，可以在计算结果时立即处理，而不需要等待所有任务完成。

我用的进程并行处理，也会有线程锁吗？

有的，不确定是哪一个位置出现的这个问题。以下是1.01的代码运行后的报告 2. Original(thread=15) time cost: 0:24:29.669 6022657 function calls (6019718 primitive calls) in 1469.671 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 23663 1397.118 0.059 1397.118 0.059 {method 'acquire' of '_thread.lock' objects}

这是询问copilot的回复

在Python的multiprocessing库中，尽管它是基于进程的并行处理，但在其内部实现中，仍然会使用线程和线程锁。这主要是因为multiprocessing.Pool在处理任务结果的收集和分发时，会使用一个内部的线程。

当你在主进程中创建一个multiprocessing.Pool时，Python会为每个子进程创建一个线程，用于监听和处理来自子进程的结果。这个线程会使用一个线程锁来同步数据的接收和处理，这就是你在性能分析中看到的{method 'acquire' of '_thread.lock' objects}。

如果你的任务主要是IO密集型的，那么这个线程锁可能不会成为性能瓶颈。但是，如果你的任务主要是CPU密集型的，那么这个线程锁可能会导致性能下降，因为它会阻塞主进程，使得主进程无法立即处理新的任务结果。

为了避免这个问题，你可以尝试使用multiprocessing.Pool.imap或multiprocessing.Pool.imap_unordered，这两个方法返回一个迭代器，可以在计算结果时立即处理，而不需要等待所有任务完成。

孙老师你好，我又尝试了一下2.03更新后的mgtwr方法，并且考虑到我的cpu是10核的因此将thread=6，可以看到并行化处理的速度依然更慢，并没有解决问题。

result report: 1.orginal(thread=1)(大于10s的结果） time cost: 0:11:7.401 636804654 function calls (613136772 primitive calls) in 667.402 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 2151360 170.420 0.000 227.291 0.000 d:\anaconda\Lib\site-packages\scipy\linalg_basic.py:40(solve) 19328500/15022891 37.441 0.000 84.094 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function} 2151360 23.911 0.000 23.911 0.000 C:\Users\34456\AppData\Roaming\Python\Python311\site-packages\mgtwr\kernel.py:48(_kernel_funcs) 2151360 18.553 0.000 18.553 0.000 {built-in method scipy.spatial._distance_pybind.cdist_euclidean} 2108160 18.015 0.000 651.187 0.000 C:\Users\34456\AppData\Roaming\Python\Python311\site-packages\mgtwr\model.py:470(_search_local_fit) 2151360 13.885 0.000 13.885 0.000 d:\anaconda\Lib\site-packages\pandas\core\roperator.py:18(rmul) 2151360 13.407 0.000 341.448 0.000 C:\Users\34456\AppData\Roaming\Python\Python311\site-packages\mgtwr\kernel.py:86(cal_distance) 101122810/96820090 12.785 0.000 23.840 0.000 {built-in method builtins.isinstance} 2151360 12.406 0.000 232.541 0.000 d:\anaconda\Lib\site-packages\pandas\core\arraylike.py:244(array_ufunc) 2151360 11.355 0.000 120.866 0.000 d:\anaconda\Lib\site-packages\pandas\core\frame.py:7599(_dispatch_frame_op) 4302721 10.630 0.000 36.559 0.000 d:\anaconda\Lib\site-packages\scipy_lib_util.py:206(_asarray_validated) 4306421 10.112 0.000 10.112 0.000 {method 'reduce' of 'numpy.ufunc' objects}

2.orginal(thread=6)（大于1s的结果）

time cost: 0:15:32.734 2911907 function calls (2908985 primitive calls) in 932.735 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 23664 901.759 0.038 901.759 0.038 {method 'acquire' of '_thread.lock' objects} 147824 14.803 0.000 14.803 0.000 {built-in method _winapi.WaitForSingleObject} 7470 9.672 0.001 9.672 0.001 {built-in method _winapi.CreateProcess}

sunkun1997 commented 11 months ago

学习了

hjjhh6 commented 11 months ago

另，anaconda默认的joblib库为1.2.0版本，现joblib最新为1.3.2版本，一定程度上的优化了线程锁的问题，但并没有完全解决，然而速度的确有一定程度的提升，经过测试没有和MGTWR工具冲突，如需要可安装pip install --upgrade joblib

ncalls tottime percall cumtime percall filename:lineno(function) 12152 132.889 0.011 132.889 0.011 {built-in method time.sleep}

sunkun1997 / mgtwr

关于大量数优化issue的并行处理优化 #8

7 孙老师你好，经过尝试，具体可见model.py的修改，非代码相关github不熟练望理解