jumormt / DeepWukong

DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network
MIT License
89 stars 23 forks source link

KeyError: 'code_sym_token' #8

Closed HS2021GO closed 1 year ago

HS2021GO commented 1 year ago

大佬您好,请问一下,在第三步的时候,您会遇到这个问题吗,有些节点的'code_sym_token'属性为空,我使用您提供的CW119数据,会遇到这个问题,还有我自己的数据也会有,恳请指导,感谢。

File "/home/dell/hu/DeepWukong/src/utils.py", line 150, in unique_xfg_sym ln_md5 = getMD5(str(xfg.nodes[ln]["code_sym_token"])) KeyError: 'code_sym_token'

HS2021GO commented 1 year ago

就是在运行src/preprocess/dataset_generator.py 生成数据集这里

jumormt commented 1 year ago

应该不会出现这个问题,code_sym_token 会在https://github.com/jumormt/DeepWukong/blob/master/src/preprocess/dataset_generator.py#L122处理时写入

HS2021GO commented 1 year ago

应该不会出现这个问题,code_sym_token 会在https://github.com/jumormt/DeepWukong/blob/master/src/preprocess/dataset_generator.py#L122处理时写入

是的,我看到了这里的写入,所以很奇怪,请问您在运行过程中会遇到如下连接拒绝的问题吗? /home/dell/anaconda3/envs/dwk/bin/python /home/dell/hu/DeepWukong/src/preprocess/dataset_generator.py /home/dell/anaconda3/envs/dwk/lib/python3.8/site-packages/pytorch_lightning/metrics/init.py:43: LightningDeprecationWarning: pytorch_lightning.metrics.* module has been renamed to torchmetrics.* and split off to its own package (https://github.com/PyTorchLightning/metrics) since v1.3 and will be removed in v1.5 rank_zero_deprecation( Global seed set to 7 testcases: 90%|█████████ | 18253/20175 [03:06<00:19, 97.81it/s] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/home/dell/hu/DeepWukong/src/preprocess/dataset_generator.py", line 131, in process_parallel queue.put(QueueMessage(xfg, xfg_path)) File "", line 2, in put File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethod kind, result = conn.recv() File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/dell/hu/DeepWukong/src/preprocess/dataset_generator.py", line 178, in add_symlines(config.dataset.name, config.data_folder, config.split_token) # false File "/home/dell/hu/DeepWukong/src/preprocess/dataset_generator.py", line 158, in add_symlines testcaseids_done: List = [ File "/home/dell/hu/DeepWukong/src/preprocess/dataset_generator.py", line 158, in testcaseids_done: List = [ File "/home/dell/anaconda3/envs/dwk/lib/python3.8/site-packages/tqdm/std.py", line 1133, in iter for obj in iterable: File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/pool.py", line 868, in next raise value EOFError Traceback (most recent call last): File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers finalizer() File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/util.py", line 224, in call res = self._callback(*self._args, self._kwargs) File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/pool.py", line 692, in _terminate_pool cls._help_stuff_finish(inqueue, task_handler, len(pool)) File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/pool.py", line 674, in _help_stuff_finish inqueue._reader.recv() File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py", line 959, in RebuildProxy return func(token, serializer, incref=incref, kwds) File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py", line 1007, in AutoProxy proxy = ProxyType(token, serializer, manager=manager, authkey=authkey, File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py", line 809, in init self._incref() File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py", line 863, in _incref conn = self._Client(self._token.address, authkey=self._authkey) File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py", line 502, in Client c = SocketClient(address) File "/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py", line 630, in SocketClient s.connect(address) ConnectionRefusedError: [Errno 111] Connection refused

Process finished with exit code 1 我用的过程中,频繁出现断连,好不容易通过了,发现里面节点有些没写入,我觉得就应该是multiprocessing.Pool,这里的原因,连接中断导致部分没写入,现在还没解决,求大佬指点。

jumormt commented 1 year ago

如果是多线程的原因你可以改成单线程版本,这样也方便调试。

HS2021GO commented 1 year ago

好的,感谢!

250444444 commented 1 year ago

我也出现了code_sym_token这个问题,请问解决了吗

HS2021GO commented 1 year ago

我也出现了code_sym_token这个问题,请问解决了吗

我最后是改成单线程执行解决的,多线程那里的断连报错我没法解决。

250444444 commented 1 year ago

改成单线程成功解决了,谢谢老哥,还有一个问题想请教一下,如何使用自己的数据集,我的数据集只有代码跟标签,应该如何更改代码呢getCodeIDtoPathDict

HS2021GO commented 1 year ago

改成单线程成功解决了,谢谢老哥,还有一个问题想请教一下,如何使用自己的数据集,我的数据集只有代码跟标签,应该如何更改代码呢getCodeIDtoPathDict

我觉得两个思路吧: 1.像原工程那样,生成对应的mainfest.xml文件,里面标注了漏洞行号。 2.写代码解析你的数据集和标签,使用deepwukong里面的过程去生成相应的切片的pkl文件。

250444444 commented 1 year ago

我的数据集只有代码跟标签没有漏洞行号,第一个方法应该行不通了,老哥是如何实现的,可以加个联系方式交流一波吗

mcf20 commented 3 months ago

应该不会出现这个问题,code_sym_token 会在 https://github.com/jumormt/DeepWukong/blob/master/src/preprocess/dataset_generator.py#L122 处理时写入

是的,我看到了这里的写入,所以很奇怪,请问您在运行过程中会遇到如下连接拒绝的问题吗? /home/dell/anaconda3/envs/dwk/bin/python /home/dell/胡/DeepWukong/src/preprocess/dataset_generator.py /home/dell/anaconda3/envs/dwk/lib/python3.8/site-packages/pytorch_lightning/metrics/init.py:43: LightningDeprecationWarning:自 v1.3 以来,模块已重命名并拆分为自己的包 (https://github.com/PyTorchLightning/metrics),并将在 v1.5 rank_中删除zero_deprecation( 全局种子设置为 7 个测试用例:90%|█████████ | 18253/20175 [03:06<00:19, 97.81it/s] multiprocessing.pool.RemoteTraceback: “”“ 回溯(最近一次调用最后):文件”/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/pool.py“,第 125 行,在工作器结果 = (True, func(*args, *kwds))文件“/home/dell/胡/DeepWukong/src/preprocess/dataset_generator.py”,第 131 行,process_parallel queue.put(QueueMessage(xfg, xfg_path)) 文件“”,第 2 行,放入文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py”,第 835 行,_callmethod种类,结果 = conn.recv() 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py”,第 250 行,在 recv buf = self._recv_bytes() 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py”,第 414 行,_recv_bytes buf = self._recv(4) 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py”,第 383 行,_recv引发 EOFError EOFError`pytorch_lightning.metrics.`torchmetrics.* """

上述异常是导致以下异常的直接原因:

回溯(最近一次调用最后一次): 文件“/home/dell/胡/DeepWukong/src/preprocess/dataset_generator.py”,第 178 行,在 add_symlines(config.dataset.name, config.data_folder, config.split_token) # false 文件“/home/dell/胡/DeepWukong/src/preprocess/dataset_generator.py”,第 158 行,在add_symlines testcaseids_done中: 列表 = [ 文件“/home/dell/胡/DeepWukong/src/preprocess/dataset_generator.py”,第 158 行,testcaseids_done: List = [ File “/home/dell/anaconda3/envs/dwk/lib/python3.8/site-packages/tqdm/std.py”, in iter for obj in iterable: File “/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/pool.py”, line 868, in next raise value EOFError Traceback (latest call last): File “/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/util.py”, line 300, in _run_finalizers finalizer() 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/util.py”,第 224 行,调用 res = self._callback(*self._args, self._kwargs) 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/pool.py”,第 692 行,_terminate_pool cls._help_stuff_finish(inqueue, task_handler, len(pool))文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/pool.py”,第 674 行,_help_stuff_finish inqueue._reader.recv() 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py”,第 251 行,在 recv 中返回 _ForkingPickler.loads(buf.getbuffer()) 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py”,第 959 行,在 RebuildProxy 中返回 func(token, serializer, incref=incref, kwds) 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py”,第 1007 行,在 AutoProxy 代理 = ProxyType(token, serializer, manager=manager, authkey=authkey, 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py”,第 809 行,在初始化 self._incref() 中文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py”,第 863 行,_incref conn = self._Client(self._token.address, authkey=self._authkey) 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py”,第 502 行,客户端 c = SocketClient(address) 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py”,第 630 行,在 SocketClient s.connect 中(地址) ConnectionRefusedError: [Errno 111] 连接被拒绝

进程以退出代码 1 完成 我用的过程中,频繁出现断连,好不容易通过了,发现里面节点有些没写入,我觉得就应该是multiprocessing.Pool,这里的原因,连接中断导致部分没写入,现在还没解决,求大佬指点。

您好 可以咨询一下该项目如何配置环境 跑起来呢 尤其是joern那部分

664730 commented 2 months ago

我也出现了code_sym_token这个问题,请问解决了吗

我最后是改成单线程执行解决的,多线程那里的断连报错我没法解决。

请问是怎么改成单线程的?是将for xfg_path in tqdm(xfg_path_list, total=len(xfg_path_list), desc="xfgs: "):改成for xfg_path in xfg_path_list:吗?虽然我试了还是报错keyerror

664730 commented 2 months ago

改成单线程成功解决了,谢谢老哥,还有一个问题想请教一下,如何使用自己的数据集,我的数据集只有代码跟标签,应该如何更改代码呢getCodeIDtoPathDict

请问是怎么改成单线程的?是将for xfg_path in tqdm(xfg_path_list, total=len(xfg_path_list), desc="xfgs: "):改成for xfg_path in xfg_path_list:吗?虽然我试了还是报错keyerror

mcf20 commented 1 month ago

应该不会出现这个问题,code_sym_token 会在 https://github.com/jumormt/DeepWukong/blob/master/src/preprocess/dataset_generator.py#L122 处理时写入

是的,我看到了这里的写入,所以很奇怪,请问您在运行过程中会遇到如下连接拒绝的问题吗?/home/dell/anaconda3/envs/dwk/bin/python /home/dell/胡/DeepWukong/src/preprocess/dataset_generator.py /home/dell/anaconda3/envs/dwk/lib/python3.8/site-packages/pytorch_lightning/metrics/init.py:43:LightningDeprecationWarning:自 v1.3 以来,module 已重命名并拆分为自己的软件包 (https://github.com/PyTorchLightning/metrics),并将在 v1.5 rank_中删除zero_deprecation( 全局种子设置为 7 个测试用例:90%|█████████ | 18253/20175 [03:06<00:19, 97.81it/s] multiprocessing.pool.RemoteTraceback: “”“ 回溯(最近调用最后一次):文件 ”/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/pool.py“,第 125 行,在 worker result = (True, func(*args, *kwds))文件“/home/dell/胡/DeepWukong/src/preprocess/dataset_generator.py”,第 131 行,process_parallel queue.put(QueueMessage(xfg, xfg_path))文件“”,第 2 行,放入文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py”,第 835 行,_callmethod类型,结果 = conn.recv() 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py”,第 250 行,以 recv buf = self 为单位._recv_bytes() 文件 “/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py”,第 414 行,_recv_bytes buf = self._recv(4) 文件 “/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py”,第 383 行,_recv raise EOFError`pytorch_lightning.metrics.`torchmetrics.* """

上述异常是导致以下异常的直接原因:

回溯(最近调用最后一次):文件 “/home/dell/胡/DeepWukong/src/preprocess/dataset_generator.py”,第 178 行,add_symlines (config.dataset.name, config.data_folder, config.split_token) # false 文件 “/home/dell/胡/DeepWukong/src/preprocess/dataset_generator.py”,第 158 行,add_symlines testcaseids_done中: list = [ 文件 “/home/dell/胡/DeepWukong/src/preprocess/dataset_generator.py”,第 158 行,testcaseids_完成:列表 = [ 文件 “/home/dell/anaconda3/envs/dwk/lib/python3.8/site-packages/tqdm/std.py”,第 1133 行,在可迭代对象中 obj 的迭代中:文件 “/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/pool.py”,第 868 行,在下一个 raise 值 EOFError 回溯(最近调用最后一个):文件 “/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/util.py”,第 300 行,_run_finalizers finalizer() 文件 “/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/util.py”,第 224 行,调用 res = self._callback(*self._args, self._kwargs)文件 “/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/pool.py”,第 692 行,_terminate_pool cls._help_stuff_finish(inqueue, task_handler, len(pool))文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/pool.py”,第 674 行,_help_stuff_finish inqueue._reader.recv() 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py”,第 251 行,返回 _ForkingPickler.loads(buf.getbuffer()) 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py”,第 959 行,在 RebuildProxy 中返回 func(token, serializer, incref=incref, kwds) 文件 “/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py”, 第 1007 行, 在 AutoProxy 代理 = ProxyType(token, serializer, manager=manager, authkey=authkey, 文件 “/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py”, 第 809 行, init self._incref() 中文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/managers.py”,第 863 行,_incref conn = self._Client(self._token.address, authkey=self._authkey) 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py”,第 502 行,客户端 c = SocketClient(address) 文件“/home/dell/anaconda3/envs/dwk/lib/python3.8/multiprocessing/connection.py”,第 630 行,在 SocketClient s.connect 中(地址) ConnectionRefusedError: [Errno 111] 连接被拒绝

Process finished with exit code 1 我用的过程中,频繁出现断连,好不容易通过了,发现里面节点有些没写入,我觉得就应该是multiprocessing。Pool,这里的原因,连接中断导致部分没写入,现在还没解决,求大佬指点。

大佬 请问是如何改为单线程的呢

mcf20 commented 1 month ago

改成单线程成功解决了,谢谢老哥,还有一个问题想请教一下,如何使用自己的数据集,我的数据集只有代码跟标签,应该如何更改代码呢getCodeIDtoPathDict

哥们 怎么改成单线程的呢 可以发我一下么