Closed ctappy closed 9 months ago
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.405368 fuse.DEBUG Inode.LookUp 3 dataset/1.0 [dataset/1.0/raw_data]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.407494 fuse.DEBUG Ref 4 dataset/1.0/raw_data [680]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.407508 fuse.DEBUG <-- LookUpInode 3 raw_data <nil>
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.407530 fuse.DEBUG Op 0x0040c92e connection.go:517] -> OK (inode 4)
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.407604 fuse.DEBUG Op 0x0040c930 connection.go:428] <- OpenDir (inode 4, PID 1811785)
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.407649 fuse.DEBUG Op 0x0040c930 connection.go:517] -> OK ()
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.407705 fuse.DEBUG Op 0x0040c932 connection.go:428] <- ReadDir (inode 4, PID 1811785)
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.407798 fuse.DEBUG ReadDir 4 dataset/1.0/raw_data [0]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.407806 fuse.DEBUG Ref 4 dataset/1.0/raw_data [681]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.407812 fuse.DEBUG <-- ReadDir 4 dataset/1.0/raw_data [raw_data 0]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.407817 fuse.DEBUG Ref 3 dataset/1.0 [576]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.407821 fuse.DEBUG <-- ReadDir 4 dataset/1.0/raw_data [1.0 1]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.441992 fuse.DEBUG Ref 21638815 dataset/1.0/raw_data/2017 [27]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.442011 fuse.DEBUG <-- ReadDir 4 dataset/1.0/raw_data [2017 2]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.442017 fuse.DEBUG Ref 21639821 dataset/1.0/raw_data/2018 [61]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.442023 fuse.DEBUG <-- ReadDir 4 dataset/1.0/raw_data [2018 3]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.442030 fuse.DEBUG Ref 21639822 dataset/1.0/raw_data/2019 [24]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.442036 fuse.DEBUG <-- ReadDir 4 dataset/1.0/raw_data [2019 4]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.442041 fuse.DEBUG Ref 21639823 dataset/1.0/raw_data/2020 [24]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.442046 fuse.DEBUG <-- ReadDir 4 dataset/1.0/raw_data [2020 5]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.442051 fuse.DEBUG Ref 21639824 dataset/1.0/raw_data/2021 [24]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.442054 fuse.DEBUG <-- ReadDir 4 dataset/1.0/raw_data [2021 6]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.442059 fuse.DEBUG Ref 21639825 dataset/1.0/raw_data/2022 [24]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.442063 fuse.DEBUG <-- ReadDir 4 dataset/1.0/raw_data [2022 7]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.442067 fuse.DEBUG Ref 21590565 dataset/1.0/raw_data/2023 [30]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.442071 fuse.DEBUG <-- ReadDir 4 dataset/1.0/raw_data [2023 8]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.443934 fuse.DEBUG ReadDir 21638815 dataset/1.0/raw_data/2017 [0]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.443945 fuse.DEBUG Ref 21638815 dataset/1.0/raw_data/2017 [28]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.443951 fuse.DEBUG <-- ReadDir 21638815 dataset/1.0/raw_data/2017 [2017 0]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.443959 fuse.DEBUG Ref 4 dataset/1.0/raw_data [682]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.443965 fuse.DEBUG <-- ReadDir 21638815 dataset/1.0/raw_data/2017 [raw_data 1]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.476737 fuse.DEBUG Ref 21638816 dataset/1.0/raw_data/2017/2017-11 [34]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.476754 fuse.DEBUG <-- ReadDir 21638815 dataset/1.0/raw_data/2017 [2017-11 2]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.476761 fuse.DEBUG Ref 21641874 dataset/1.0/raw_data/2017/2017-12 [19]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.476767 fuse.DEBUG <-- ReadDir 21638815 dataset/1.0/raw_data/2017 [2017-12 3]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.476784 fuse.DEBUG Op 0x0040c93a connection.go:517] -> OK ()
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.476872 fuse.DEBUG Op 0x0040c93c connection.go:428] <- ReadDir (inode 21638815, PID 1811785)
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.476964 fuse.DEBUG ReadDir 21638815 dataset/1.0/raw_data/2017 [4]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.476980 fuse.DEBUG Op 0x0040c93c connection.go:517] -> OK ()
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.477029 fuse.DEBUG Op 0x0040c93e connection.go:428] <- ReleaseDirHandle (PID 0)
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.477048 fuse.DEBUG ReleaseDirHandle dataset/1.0/raw_data/2017
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.477065 fuse.DEBUG Op 0x0040c93e connection.go:517] -> OK ()
Jan 15 11:55:56 nodekw7 geesefs[1093863]: 2024/01/15 11:55:56.477099 fuse.DEBUG Op 0x025cf746 connection.go:428] <- ReadSymlink (inode 3158, PID 1811785)
Jan 15 11:55:56 nodekw7 geesefs[1093863]: 2024/01/15 11:55:56.477145 fuse.DEBUG Op 0x025cf746 connection.go:517] -> OK ()
Jan 15 11:55:56 nodekw7 geesefs[1093863]: 2024/01/15 11:55:56.477218 fuse.DEBUG Op 0x025cf748 connection.go:428] <- ReadSymlink (inode 3158, PID 1811785)
Jan 15 11:55:56 nodekw7 geesefs[1093863]: 2024/01/15 11:55:56.477288 fuse.DEBUG Op 0x025cf748 connection.go:517] -> OK ()
Jan 15 11:55:56 nodekw7 geesefs[1093863]: 2024/01/15 11:55:56.477354 fuse.DEBUG Op 0x025cf74a connection.go:428] <- ReadSymlink (inode 3158, PID 1811785)
Jan 15 11:55:56 nodekw7 geesefs[1093863]: 2024/01/15 11:55:56.477379 fuse.DEBUG Op 0x025cf74a connection.go:517] -> OK ()
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.477423 fuse.DEBUG Op 0x0040c940 connection.go:428] <- OpenDir (inode 21638816, PID 1811785)
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.477445 fuse.DEBUG Op 0x0040c940 connection.go:517] -> OK ()
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.477493 fuse.DEBUG Op 0x0040c942 connection.go:428] <- ReadDir (inode 21638816, PID 1811785)
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.477544 fuse.DEBUG ReadDir 21638816 dataset/1.0/raw_data/2017/2017-11 [0]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.477559 fuse.DEBUG Ref 21638816 dataset/1.0/raw_data/2017/2017-11 [35]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.477566 fuse.DEBUG <-- ReadDir 21638816 dataset/1.0/raw_data/2017/2017-11 [2017-11 0]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.477571 fuse.DEBUG Ref 21638815 dataset/1.0/raw_data/2017 [29]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.477576 fuse.DEBUG <-- ReadDir 21638816 dataset/1.0/raw_data/2017/2017-11 [2017 1]
Jan 15 11:55:56 nodekw7 geesefs[1093817]: 2024/01/15 11:55:56.511710 fuse.DEBUG Ref 21638817 dataset/1.0/raw_data/2017/2017-11/2017-11-01 [5]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754193 fuse.DEBUG Ref 21649909 dataset/1.0/raw_data/2017/2017-11/2017-11-15/sample_data_#DT-F@_2017-11-15_180000-181000.txt.gz [3]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754198 fuse.DEBUG <-- ReadDir 21641887 dataset/1.0/raw_data/2017/2017-11/2017-11-15 [sample_data_#DT-F@_2017-11-15_180000-181000.txt.gz 240]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754203 fuse.DEBUG Ref 21649910 dataset/1.0/raw_data/2017/2017-11/2017-11-15/sample_data_#DT-F@_2017-11-15_181000-182000.txt.gz [3]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754207 fuse.DEBUG <-- ReadDir 21641887 dataset/1.0/raw_data/2017/2017-11/2017-11-15 [sample_data_#DT-F@_2017-11-15_181000-182000.txt.gz 241]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754220 fuse.DEBUG Op 0x0040cdd8 connection.go:517] -> OK ()
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754282 fuse.DEBUG Op 0x0040cdda connection.go:428] <- ReadDir (inode 21641887, PID 1811785)
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754317 fuse.DEBUG ReadDir 21641887 dataset/1.0/raw_data/2017/2017-11/2017-11-15 [242]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754324 fuse.DEBUG Ref 21649911 dataset/1.0/raw_data/2017/2017-11/2017-11-15/sample_data_#DT-F@_2017-11-15_182000-183000.txt.gz [3]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754329 fuse.DEBUG <-- ReadDir 21641887 dataset/1.0/raw_data/2017/2017-11/2017-11-15 [sample_data_#DT-F@_2017-11-15_182000-183000.txt.gz 242]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754335 fuse.DEBUG Ref 21649912 dataset/1.0/raw_data/2017/2017-11/2017-11-15/sample_data_#DT-F@_2017-11-15_183000-184000.txt.gz [3]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754339 fuse.DEBUG <-- ReadDir 21641887 dataset/1.0/raw_data/2017/2017-11/2017-11-15 [sample_data_#DT-F@_2017-11-15_183000-184000.txt.gz 243]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754345 fuse.DEBUG Ref 21649913 dataset/1.0/raw_data/2017/2017-11/2017-11-15/sample_data_#DT-F@_2017-11-15_184000-185000.txt.gz [3]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754350 fuse.DEBUG <-- ReadDir 21641887 dataset/1.0/raw_data/2017/2017-11/2017-11-15 [sample_data_#DT-F@_2017-11-15_184000-185000.txt.gz 244]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754356 fuse.DEBUG Ref 21649914 dataset/1.0/raw_data/2017/2017-11/2017-11-15/sample_data_#DT-F@_2017-11-15_185000-190000.txt.gz [3]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754360 fuse.DEBUG <-- ReadDir 21641887 dataset/1.0/raw_data/2017/2017-11/2017-11-15 [sample_data_#DT-F@_2017-11-15_185000-190000.txt.gz 245]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754367 fuse.DEBUG Ref 21649915 dataset/1.0/raw_data/2017/2017-11/2017-11-15/sample_data_#DT-F@_2017-11-15_190000-191000.txt.gz [3]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754371 fuse.DEBUG <-- ReadDir 21641887 dataset/1.0/raw_data/2017/2017-11/2017-11-15 [sample_data_#DT-F@_2017-11-15_190000-191000.txt.gz 246]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754377 fuse.DEBUG Ref 21649916 dataset/1.0/raw_data/2017/2017-11/2017-11-15/sample_data_#DT-F@_2017-11-15_191000-192000.txt.gz [3]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754382 fuse.DEBUG <-- ReadDir 21641887 dataset/1.0/raw_data/2017/2017-11/2017-11-15 [sample_data_#DT-F@_2017-11-15_191000-192000.txt.gz 247]
Jan 15 11:55:57 nodekw7 geesefs[1093817]: 2024/01/15 11:55:57.754386 fuse.DEBUG Ref 21649917 dataset/1.0/raw_data/2017/2017-11/2017-11-15/sample_data_#DT-F@_2017-11-15_192000-193000.txt.gz [3]
Jan 15 11:56:15 nodekw7 systemd[1]: geesefs-node_2dusr_2dd1_2d00000005.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 15 11:56:15 nodekw7 systemd[1]: var-lib-kubelet-plugins-kubernetes.io-csi-ru.yandex.s3.csi-7dc3bb9adbb8835421cb7f43f7d7d6b10d0a838aa50b616e14ccc1d2f6108f21-globalmount.mount: Deactivated successfully.
Jan 15 11:56:15 nodekw7 systemd[1]: geesefs-node_2dusr_2dd1_2d00000005.service: Failed with result 'exit-code'.
the above journalctl message are snippets over the last few seconds before the crash, separated by spaces
Hi, maybe this was the OOM killer? Check dmesg please. Panics should generate a backtrace...
nothing in dmesg, I have a lot of memory in the server it is running on. I have tried recreating the issue but I was not successful. I'll close for now
this happened again. I believe the issue is related to the nic and not geeseFS
Jan 30 15:40:07 dapu1kw2 geesefs[3181398]: 2024/01/30 15:40:07.787447 main.INFO I/O: 2.00 read/s, 86.67 % hits, 0.10 write/s; metadata: 29.13 read/s, 0.17 write/s, 7.20 noop/s, 6457 alive, 0.00 evict/s; 0.07 flush/s Jan 30 15:40:07 dapu1kw2 geesefs[3181412]: 2024/01/30 15:40:07.978089 main.INFO I/O: 0.00 read/s, 0.00 % hits, 0.00 write/s; metadata: 0.00 read/s, 0.00 write/s, 0.00 noop/s, 1 alive, 0.00 evict/s; 0.00 flush/s
Jan 30 15:40:18 dapu1kw2 geesefs[3181398]: panic: Unknown inode: 1554 Jan 30 15:40:18 dapu1kw2 geesefs[3181398]: goroutine 78 [running]: Jan 30 15:40:18 dapu1kw2 geesefs[3181398]: github.com/yandex-cloud/geesefs/internal.(Goofys).getInodeOrDie(0xc00015a6c0, 0x0?) Jan 30 15:40:18 dapu1kw2 geesefs[3181398]: /home/runner/work/geesefs/geesefs/internal/goofys.go:437 +0x105 Jan 30 15:40:18 dapu1kw2 geesefs[3181398]: github.com/yandex-cloud/geesefs/internal.(Goofys).Flusher(0xc00015a6c0) Jan 30 15:40:18 dapu1kw2 geesefs[3181398]: /home/runner/work/geesefs/geesefs/internal/goofys.go:638 +0x16a Jan 30 15:40:18 dapu1kw2 geesefs[3181398]: created by github.com/yandex-cloud/geesefs/internal.newGoofys Jan 30 15:40:18 dapu1kw2 geesefs[3181398]: /home/runner/work/geesefs/geesefs/internal/goofys.go:368 +0xa4a
hey @vitalif I can almost always recreate this issue, one in four attempts. I create about 1500 files and then remove them all with rm -rf path/to/dir
let me know if you need debug on or anything related, thanks so much! I may put some time into this in a few weeks
Hi, can you please try to build a version from master and retest with it?
I should be able to get a some cycles in tomorrow on this, thanks!
@vitalif looks really good, thanks!!
While converting csv files into parquet files using scala/spark, I started to delete all of the files instead of stopping the process and deleting the files. geesefs in k8s crashed without a stackdump/error message in journalctl
The files were being created on one geesefs spark worker, and on another server with geesefs i deleted them