Closed dafang closed 3 months ago
the docker file:
FROM python:3.10-slim
RUN apt-get clean && apt-get update && apt-get install -y gcc pkg-config libseccomp-dev wget xz-utils
RUN apt-get install -y gcc-multilib
# copy main binary to /main
COPY main /main
COPY requirements.txt /requirements.txt
COPY conf/config.yaml /conf/config.yaml
RUN rm -rf /var/lib/apt/lists/* \
&& chmod +x /main \
&& pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple jinja2 requests httpx PySocks httpx[socks] \
&& pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt \
&& wget -O /opt/node-v20.11.1-linux-x64.tar.xz https://npmmirror.com/mirrors/node/v20.11.1/node-v20.11.1-linux-x64.tar.xz \
&& tar -xvf /opt/node-v20.11.1-linux-x64.tar.xz -C /opt \
&& ln -s /opt/node-v20.11.1-linux-x64/bin/node /usr/local/bin/node \
&& rm -f /opt/node-v20.11.1-linux-x64.tar.xz
ENTRYPOINT ["/main"]
the requirements.txt:
aiohttp==3.8.6 ; python_version >= "3.10" and python_version < "4.0"
aiohttp[speedups]==3.8.6 ; python_version >= "3.10" and python_version < "4.0"
click==8.1.7 ; python_version >= "3.10" and python_version < "4.0"
markdown==3.5.2 ; python_version >= "3.10" and python_version < "4.0"
pypdf==3.17.4 ; python_version >= "3.10" and python_version < "4.0"
numpy==1.23.5 ; python_version >= "3.10" and python_version < "4.0"
I guess you need to add this shared library there https://github.com/langgenius/dify-sandbox/blob/main/internal/static/config_default_amd64.go as numpy depends on this C-extension but it has not been copied into isolation environments.
BTW, do other libraries work well?
this partially works.
after added "/usr/lib/x86_64-linux-gnu/libgcc_s.so.1", can import numpy but with permission error, I scan through the code, seems it was blocked by the seccomp, so after I temp disabled the seccomp, it works. (we run the code interpreter in severless env, so the seccomp is not necessary, but I will continue to fig out which syscall is required by numpy...)
another lib, I added is the "/usr/lib/x86_64-linux-gnu/librt.so.1", this is the "so" depends by pydantic.
import numpy as np
strace
to log all the syscalls:strace -o strace_output.txt -e trace=all python test_numpy.py
awk '{print $1}' strace_output.txt | sed 's/[(].*//' | sort | uniq -c | sort -nr
then, got the list of syscalls, diff and add:
831 stat
418 fstat
393 read
337 lseek
278 openat
250 close
215 mmap
180 ioctl
68 rt_sigaction
60 mprotect
54 getdents64
42 brk
35 futex
18 pread64
17 munmap
7 clone
6 lstat
4 readlink
3 uname
3 dup
2 shmget
2 getuid
2 getgid
2 geteuid
2 getegid
2 getcwd
2 arch_prctl
1 sysinfo
1 shmdt
1 shmat
1 set_tid_address
1 set_robust_list
1 sched_getaffinity
1 rt_sigprocmask
1 prlimit64
1 gettid
1 fcntl
1 exit_group
1 execve
1 epoll_create1
1 access
but, unfortunately, after added all the syscalls, still got the error response:
{"code":0,"message":"success","data":{"error":"OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k error: operation not permitted exit_string: signal: bad system call\n","stdout":""}}
the "bad system call" is error message added by me...
still digging the reason..
but, unfortunately, after added all the syscalls, still got the error response:
{"code":0,"message":"success","data":{"error":"OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k error: operation not permitted exit_string: signal: bad system call\n","stdout":""}}
the "bad system call" is error message added by me...
still digging the reason..
That's a bad news, but I noticed that some of syscalls you added is already exist in allowed_syscalls such as SYS_BRK, SYS_OPENAT, maybe strace -o strace_output.txt -e trace=all python test_numpy.py
could be optimized.
- write one test python file, for example: test_numpy.py and just add one line of code to import numpy:
import numpy as np
- use
strace
to log all the syscalls:strace -o strace_output.txt -e trace=all python test_numpy.py
- then use awk and sort to print all the syscalls:
awk '{print $1}' strace_output.txt | sed 's/[(].*//' | sort | uniq -c | sort -nr
then, got the list of syscalls, diff and add:
831 stat 418 fstat 393 read 337 lseek 278 openat 250 close 215 mmap 180 ioctl 68 rt_sigaction 60 mprotect 54 getdents64 42 brk 35 futex 18 pread64 17 munmap 7 clone 6 lstat 4 readlink 3 uname 3 dup 2 shmget 2 getuid 2 getgid 2 geteuid 2 getegid 2 getcwd 2 arch_prctl 1 sysinfo 1 shmdt 1 shmat 1 set_tid_address 1 set_robust_list 1 sched_getaffinity 1 rt_sigprocmask 1 prlimit64 1 gettid 1 fcntl 1 exit_group 1 execve 1 epoll_create1 1 access
Maybe you can refer to https://github.com/langgenius/dify-sandbox/blob/main/cmd/test/fuzz_nodejs_amd64/main.go, you can set a range of syscalls from 0 to 400 on line 57, and see if errors raise, if not, it means all necessary syscalls are permitted, then you can reduce it to 0\~200 or 200\~400, continue this process, until you found the syscall which is needed.
but, unfortunately, after added all the syscalls, still got the error response:
{"code":0,"message":"success","data":{"error":"OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k error: operation not permitted exit_string: signal: bad system call\n","stdout":""}}
the "bad system call" is error message added by me... still digging the reason..
That's a bad news, but I noticed that some of syscalls you added is already exist in allowed_syscalls such as SYS_BRK, SYS_OPENAT, maybe
strace -o strace_output.txt -e trace=all python test_numpy.py
could be optimized.
Yes, this is what I have done. I ready filtered out what you have added. Will try following your test case. Tks
- write one test python file, for example: test_numpy.py and just add one line of code to import numpy:
import numpy as np
- use
strace
to log all the syscalls:strace -o strace_output.txt -e trace=all python test_numpy.py
- then use awk and sort to print all the syscalls:
awk '{print $1}' strace_output.txt | sed 's/[(].*//' | sort | uniq -c | sort -nr
then, got the list of syscalls, diff and add:
831 stat 418 fstat 393 read 337 lseek 278 openat 250 close 215 mmap 180 ioctl 68 rt_sigaction 60 mprotect 54 getdents64 42 brk 35 futex 18 pread64 17 munmap 7 clone 6 lstat 4 readlink 3 uname 3 dup 2 shmget 2 getuid 2 getgid 2 geteuid 2 getegid 2 getcwd 2 arch_prctl 1 sysinfo 1 shmdt 1 shmat 1 set_tid_address 1 set_robust_list 1 sched_getaffinity 1 rt_sigprocmask 1 prlimit64 1 gettid 1 fcntl 1 exit_group 1 execve 1 epoll_create1 1 access
Maybe you can refer to https://github.com/langgenius/dify-sandbox/blob/main/cmd/test/fuzz_nodejs_amd64/main.go, you can set a range of syscalls from 0 to 400 on line 57, and see if errors raise, if not, it means all necessary syscalls are permitted, then you can reduce it to 0~200 or 200~400, continue this process, until you found the syscall which is needed.
Good to start, I modified your test.py, without luck:
os.environ["ALLOWED_SYSCALLS"] = ",".join([str(i) for i in range(303)])
302 is the biggest syscall numimport numpy as np
, failed with "Bad system call"Actually, if I didn't add the import numpy, still fail with bad system call, I found that it is caused by the base64 import, so I commented it out, then success.
Not sure whether it is caused by others.
My testing PC is alicloud ECS:
main
entrancetest.py
import ctypes
import json
import os
import sys
import traceback
os.environ["ALLOWED_SYSCALLS"] = ",".join([str(i) for i in range(303)]) # added by me
# setup sys.excepthook
def excepthook(type, value, tb):
sys.stderr.write("".join(traceback.format_exception(type, value, tb)))
sys.stderr.flush()
sys.exit(-1)
sys.excepthook = excepthook
lib = ctypes.CDLL("/var/sandbox/sandbox-python/python.so")
lib.DifySeccomp.argtypes = [ctypes.c_uint32, ctypes.c_uint32, ctypes.c_bool]
lib.DifySeccomp.restype = None
import json
import os
import sys
import traceback
os.chdir("/var/sandbox/sandbox-python")
lib.DifySeccomp(65537, 1001, 1)
# declare main function here
def main() -> dict:
return {"message": [1, 2, 3]}
# from base64 import b64decode
from json import dumps, loads
# execute main function, and return the result
# inputs is a dict, and it
# inputs = b64decode("e30=").decode("utf-8")
output = main()
# convert output to json and print
output = dumps(output, indent=4)
result = f"""<<RESULT>>
{output}
<<RESULT>>"""
print(result)
print(os.environ["ALLOWED_SYSCALLS"])
import numpy as np
print(np.version.full_version)
You can try @Yeuoly
The above test.py result:
After debugging for some cases, I found these MAY BE the bugs:
InitSeccomp is called through Python prescript.py
code, which causes that, in the InitSeccomp
func, the logic of allowed_syscall := os.Getenv("ALLOWED_SYSCALLS")
is illegal and allowed_syscall
is empty. Following logics are by passed:
Even if changed the hard coded ALLOW_SYSCALLS
to the full list of syscalls, still run into the "bad system call" error:
var ALLOW_SYSCALLS = []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302}
Below is the log I printed in the InitSeccomp
func:
{"code":0,"message":"success","data":{"error":"OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k\nerror: operation not permitted\nexit_string: signal: bad system call\n","stdout":"2024/07/15 23:55:51 add_seccomp.go:46: [WARN]## allowed syscalls: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302]\n"}}
After debugging for some cases, I found these MAY BE the bugs:
- InitSeccomp is called through Python
prescript.py
code, which causes that, in theInitSeccomp
func, the logic ofallowed_syscall := os.Getenv("ALLOWED_SYSCALLS")
is illegal andallowed_syscall
is empty. Following logics are by passed:- Even if changed the hard coded
ALLOW_SYSCALLS
to the full list of syscalls, still run into the "bad system call" error:var ALLOW_SYSCALLS = []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302}
This logic is for debug only and it will never be used in production environment, as for base64 encoding, it should works, I have setup CI tests for this, all checks passed, maybe some syscalls are missing?
- write one test python file, for example: test_numpy.py and just add one line of code to import numpy:
import numpy as np
- use
strace
to log all the syscalls:strace -o strace_output.txt -e trace=all python test_numpy.py
- then use awk and sort to print all the syscalls:
awk '{print $1}' strace_output.txt | sed 's/[(].*//' | sort | uniq -c | sort -nr
then, got the list of syscalls, diff and add:
831 stat 418 fstat 393 read 337 lseek 278 openat 250 close 215 mmap 180 ioctl 68 rt_sigaction 60 mprotect 54 getdents64 42 brk 35 futex 18 pread64 17 munmap 7 clone 6 lstat 4 readlink 3 uname 3 dup 2 shmget 2 getuid 2 getgid 2 geteuid 2 getegid 2 getcwd 2 arch_prctl 1 sysinfo 1 shmdt 1 shmat 1 set_tid_address 1 set_robust_list 1 sched_getaffinity 1 rt_sigprocmask 1 prlimit64 1 gettid 1 fcntl 1 exit_group 1 execve 1 epoll_create1 1 access
Maybe you can refer to https://github.com/langgenius/dify-sandbox/blob/main/cmd/test/fuzz_nodejs_amd64/main.go, you can set a range of syscalls from 0 to 400 on line 57, and see if errors raise, if not, it means all necessary syscalls are permitted, then you can reduce it to 0~200 or 200~400, continue this process, until you found the syscall which is needed.
Good to start, I modified your test.py, without luck:
- I added the allowed syscalls in the begin of the test code:
os.environ["ALLOWED_SYSCALLS"] = ",".join([str(i) for i in range(303)])
302 is the biggest syscall num- at the end of the test code, add
import numpy as np
, failed with "Bad system call"Actually, if I didn't add the import numpy, still fail with bad system call, I found that it is caused by the base64 import, so I commented it out, then success.
Not sure whether it is caused by others.
My testing PC is alicloud ECS:
- Linux dev-ecs 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
- 8 core 32G memory
- python is 3.10, under conda env
- Go: go version go1.21.6 linux/amd64 // I can build and run the
main
entrancetest.py
import ctypes import json import os import sys import traceback os.environ["ALLOWED_SYSCALLS"] = ",".join([str(i) for i in range(303)]) # added by me # setup sys.excepthook def excepthook(type, value, tb): sys.stderr.write("".join(traceback.format_exception(type, value, tb))) sys.stderr.flush() sys.exit(-1) sys.excepthook = excepthook lib = ctypes.CDLL("/var/sandbox/sandbox-python/python.so") lib.DifySeccomp.argtypes = [ctypes.c_uint32, ctypes.c_uint32, ctypes.c_bool] lib.DifySeccomp.restype = None import json import os import sys import traceback os.chdir("/var/sandbox/sandbox-python") lib.DifySeccomp(65537, 1001, 1) # declare main function here def main() -> dict: return {"message": [1, 2, 3]} # from base64 import b64decode from json import dumps, loads # execute main function, and return the result # inputs is a dict, and it # inputs = b64decode("e30=").decode("utf-8") output = main() # convert output to json and print output = dumps(output, indent=4) result = f"""<<RESULT>> {output} <<RESULT>>""" print(result) print(os.environ["ALLOWED_SYSCALLS"]) import numpy as np print(np.version.full_version)
You can try @Yeuoly
The syscall number 302 is not the highest, there are nearly 400 syscall numbers, but in Go, they are only defined up to 302.
You are right, after defined to 500, it finally works. So next step to figure out which syscall is the easy way. Thanks.
/close
BTW, are you interested in contribute this to main branch?
BTW, are you interested in contribute this to main branch?
Sure, will give you the PR later.
use the amd64 dockerfile build the docker image and run the main file after installed numpy, then run the code to import numpy, got following error: