question(filesystem): TFS status

rinor commented 1 year ago

I'm about to start testing nanos filesystem/storage stability. Mostly interested in data storage r/w correctness/corruption and system memory requirements/relationships between r/w data size/frequency on "root" partition/drive/volume and also on additional "mounted" volumes/drives.

I.e:

what happens when the "root" volume is near 100% usage and smth (logs) fails to be written due to insufficient free space. Does smth panic (don't care much about the program itself since it can handle failures depending on the use case, mostly interested about the kernel and tfs itself)
what happens when the same happens on a non root mounted volume. Does anything get corrupted

I'll report the findings once tested, but I am curious if there are any known "issues" you are aware of in general or unhandled edge cases.

francescolavra commented 1 year ago

When a volume (root or otherwise) is near 100% usage and a write fails due to insufficient free space, there shouldn't be any panic or filesystem corruption: the kernel just returns the ENOSPC error number to indicate lack of disk space. I'm not aware of any known issues, the TFS log code reserves upfront the disk space it may need. Of course, if you find any bugs or unhandled edge cases, you are more than welcome to report them!

rinor commented 1 year ago

Having some data corruption issues while performing initial tests. Will start with the first one:

Scenario
1. add/mount and external volume to test and write data to (i.e: not using root image for test files atm)
2. write fixed MiB size (power of 2 for now, 1MiB, 2MiB, 4MiB, 8MiB,..) files with the same random generated data until you get a failure (ENOSPC)
3. verify checksum (sha1) of the created files - expect to match the original data
4. upon restart verify checksum (sha1) of the files created ad (2) - expect to match the original data
5. delete test files and repeat from (2)

Software versions

```sh $ qemu-system-x86_64 --version QEMU emulator version 7.2.0 (Debian 1:7.2+dfsg-5~bpo11+1) $ qemu-img --version qemu-img version 7.2.0 (Debian 1:7.2+dfsg-5~bpo11+1) $ ops version Ops version: master-eeefe46 Nanos version: master-1510adc $ go version go version go1.20.3 linux/amd64 $ lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 11 (bullseye) Release: 11 Codename: bullseye $ cat /proc/version Linux version 5.10.0-22-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.178-3 (2023-04-22) ```

Code used

- binary named `stressdisk` ```go package main import ( "crypto/rand" "crypto/sha1" "encoding/hex" "flag" "fmt" "log" "os" "path/filepath" "syscall" "time" ) const ( ZERO_DATA_DEFAULT = false // -z CLEAN_ONLY_DEFAULT = false // -c WAIT_EXIT_DEFAULT = 60 // -w seconds EXTRA_FILES_DEFAULT = 0 // -f number WAIT_ACTION_DEFAULT = 1 // -t seconds TEST_FILE_SIZE_DEFAULT = 4 // -s MiB TEST_DIR_DEFAULT = "tfs_root" // -d dir TEST_FILE_PREFIX = "TST_" ) const ( KiB = 1_024 * 1 // bytes MiB = 1_024 * KiB // bytes GiB = 1_024 * MiB // bytes ) var ( // flags testDir = flag.String("d", TEST_DIR_DEFAULT, "Directory to use for test files") fileSize = flag.Int("s", TEST_FILE_SIZE_DEFAULT, "Size (MiB) of data to write on each test file") extraFiles = flag.Int("f", EXTRA_FILES_DEFAULT, "Number of additional files to attempt writting to (over)fill the drive") waitSec = flag.Int("t", WAIT_ACTION_DEFAULT, "Time in seconds to wait after each file r/w action") // possibly to prevent written data corruption on qemu waitExit = flag.Int("w", WAIT_EXIT_DEFAULT, "Time in seconds to wait before terminating the program") cleanOnly = flag.Bool("c", CLEAN_ONLY_DEFAULT, "Clean test files from the storage and terminate the program") zeroData = flag.Bool("z", ZERO_DATA_DEFAULT, "Zero bytes on the data files (no random data generated)") ) func main() { flag.Parse() if *fileSize <= 0 || *waitSec < 0 || *waitExit < 0 || *testDir == "" { log.Fatal("flag(s) provided but wrong value") } // from flags TEST_DIR := *testDir TEST_FILE_SIZE := *fileSize * MiB WAIT_SECONDS := time.Duration(*waitSec) * time.Second EXTRA_FILES := *extraFiles CLEAN_ONLY := *cleanOnly WAIT_EXIT := *waitExit ZERO_DATA := *zeroData // disk stats ds := NewStatFS(TEST_DIR) // 1. clean-up test dir - calculate/check sha1sum and print existing data before delete { STEP := "1" log.Println() log.Printf("STEP(%s): Checking for existing data on (%s)", STEP, TEST_DIR) log.Println() // fetch disk stats if err := ds.Fetch(); err != nil { log.Fatalf("STEP(%s): %s", STEP, err) } else { ds.Print() log.Println() } files, err := filepath.Glob(filepath.Join(TEST_DIR, TEST_FILE_PREFIX+"*")) if err != nil { log.Fatalf("STEP(%s): %s", STEP, err) } for _, fn := range files { if data, err := os.ReadFile(fn); err != nil { log.Printf("STEP(%s): %s", STEP, err) } else { dataSHA := sha1.Sum(data) dataSHAHex := hex.EncodeToString(dataSHA[:]) log.Printf("STEP(%s): %s - %x (%d B) - %s\n", STEP, fn, dataSHA, len(data), func(status bool) string { if status { return "OK" } return "CORRUPTED" }(fn[len(fn)-len(dataSHAHex):] == dataSHAHex)) } // unlink only no truncate here - Syscall(SYS_UNLINKAT,...) if err := os.Remove(fn); err != nil { log.Printf("STEP(%s): %s", STEP, err) } // wait if WAIT_SECONDS.Seconds() > 0 { time.Sleep(WAIT_SECONDS) } } if CLEAN_ONLY { log.Println() log.Printf("DONE - terminating in %d seconds", WAIT_EXIT) time.Sleep(time.Duration(WAIT_EXIT) * time.Second) os.Exit(0) } } // 2. write some files with the same data (random or zeroed) { STEP := "2" log.Println() log.Printf("STEP(%s): Trying to write (%d bytes => %d blocks)/file on (%s)", STEP, TEST_FILE_SIZE, ds.BlockFiles(TEST_FILE_SIZE), TEST_DIR) log.Println() data := make([]byte, TEST_FILE_SIZE) if !ZERO_DATA { if _, err := rand.Read(data); err != nil { log.Fatalf("STEP(%s): %s", STEP, err) } } dataSHA := sha1.Sum(data) // fetch/update disk stats - since we may have deleted files on STEP(1) if err := ds.Fetch(); err != nil { log.Fatalf("STEP(%s): %s", STEP, err) } else { ds.Print() log.Println() } maxFiles := ds.MaxFiles(TEST_FILE_SIZE) log.Printf("STEP(%s): Approx (%d) file(s) expected to succeed, (%d) additional file(s) may be attempted", STEP, maxFiles, EXTRA_FILES) for i := 0; i < maxFiles+EXTRA_FILES; i++ { fn := filepath.Join(TEST_DIR, fmt.Sprintf("%s%04d_%x", TEST_FILE_PREFIX, i+1, dataSHA)) f, err := os.OpenFile(fn, os.O_TRUNC|os.O_CREATE|os.O_WRONLY|syscall.O_DIRECT, 0666) // O_DIRECT may not have any effect !!! if err != nil { log.Fatalf("STEP(%s): %s", STEP, err) } n, err := f.Write(data) if errc := f.Close(); errc != nil { log.Printf("STEP(%s): %s", STEP, errc) } if err != nil { log.Printf("STEP(%s): %s", STEP, err) if err := os.Remove(fn); err != nil { log.Printf("STEP(%s): %s", STEP, err) } } else { log.Printf("STEP(%s): %s - %x (%d B) - %s\n", STEP, fn, sha1.Sum(data), n, func(status bool) string { if status { return "OK" } return "OK, SHOULD HAVE FAILED" }(i < maxFiles)) } // wait if WAIT_SECONDS.Seconds() > 0 { time.Sleep(WAIT_SECONDS) } if err != nil { break } } } // 3. read files content and calculate/check sha1sum { STEP := "3" log.Println() log.Printf("STEP(%s): Read files and calculate checksum on (%s)", STEP, TEST_DIR) log.Println() // fetch/update disk stats - since we may have added files on STEP(2) if err := ds.Fetch(); err != nil { log.Fatalf("STEP(%s): %s", STEP, err) } else { ds.Print() log.Println() } files, err := filepath.Glob(filepath.Join(TEST_DIR, TEST_FILE_PREFIX+"*")) if err != nil { log.Fatalf("STEP(%s): %s", STEP, err) } for _, fn := range files { if data, err := os.ReadFile(fn); err != nil { log.Printf("STEP(%s): %s", STEP, err) } else { sha1Sum := sha1.Sum(data) sha1SumHex := hex.EncodeToString(sha1Sum[:]) log.Printf("STEP(%s): %s - %x (%d B) - %s\n", STEP, fn, sha1Sum, len(data), func(status bool) string { if status { return "OK" } return "CORRUPTED" }(fn[len(fn)-len(sha1SumHex):] == sha1SumHex)) } // wait if WAIT_SECONDS.Seconds() > 0 { time.Sleep(WAIT_SECONDS) } } } // 4. Wait before exiting (possibly to prevent written data corruption on qemu) !!! { log.Println() log.Printf("DONE - terminating in %d seconds", WAIT_EXIT) time.Sleep(time.Duration(WAIT_EXIT) * time.Second) log.Println("DONE - terminating") } } /* disk stats helpers */ // StatFS ... type StatFS struct { path string stat syscall.Statfs_t } // NewStatFS ... func NewStatFS(path string) *StatFS { var stat syscall.Statfs_t return &StatFS{path: path, stat: stat} } // Fetch the updated stats func (s *StatFS) Fetch() error { return syscall.Statfs(s.path, &s.stat) } // MaxFiles that can be written based on filesize and free blocks func (s *StatFS) MaxFiles(size int) int { return int(s.stat.Bfree / (uint64(size) / uint64(s.stat.Bsize))) } // BlockFiles calculates the amount of blocks needes for a file based on filesize and blocksize func (s *StatFS) BlockFiles(size int) int { return size / int(s.stat.Bsize) } // Print ... func (s *StatFS) Print() { log.Printf("\tBlock Size : %dB => %d blocks/MiB", s.stat.Bsize, MiB/s.stat.Bsize) log.Printf("\tBlocks Total: %d Blocks", s.stat.Blocks) log.Printf("\tBlocks Free : %d Blocks", s.stat.Bfree) log.Printf("\tBlocks Avail: %d Blocks", s.stat.Bavail) log.Printf("\tFiles : %d", s.stat.Files) log.Printf("\tFfree : %d", s.stat.Ffree) log.Printf("\tFsid : %v", s.stat.Fsid.X__val) log.Printf("\tNamelen : %d", s.stat.Namelen) log.Printf("\tFrsize : %d", s.stat.Frsize) log.Printf("\tFlags : %d", s.stat.Flags) log.Printf("\tSpare : %v", s.stat.Spare) } ```

OPS config

```sh $ ops volume create stressdisk -s 32M ``` ```json { "BaseVolumeSz": "32m", "Args": [ "-w=10", "-f=1", "-s=4", "-t=0", "-d=/tfs_mnt" ], "Mounts": { "stressdisk": "tfs_mnt" }, "RunConfig": { "Memory": "2G" } } ``` ```sh $ ops run -c config.json stressdisk -f -v --show-debug ```

Test Result (4MiB) corrupted

```sh $ ops run -c config.json stressdisk -f -v --show-debug Manifest: &{root:map[arguments:[stressdisk -w 10 -f 1 -s 4 -t 0 -d /tfs_mnt] children:map[etc:map[passwd:/tmp/stressdisk_temp21312391/passwd resolv.conf:/tmp/stressdisk_temp21312391/resolv.conf ssl:map[certs:map[ca-certificates.crt:/home/rinor/.ops/common/ca-certificates.crt]]] lib:map[x86_64-linux-gnu:map[libnss_dns.so.2:/home/rinor/.ops/common/libnss_dns.so.2]] proc:map[sys:map[kernel:map[hostname:/tmp/stressdisk_temp21312391/hostname]]] stressdisk:stressdisk tfs_mnt:map[]] environment:map[IMAGE_NAME:stressdisk NANOS_VERSION:0.1.44 OPS_VERSION:master-eeefe46 PWD:/ USER:root] mounts:map[stressdisk:tfs_mnt] program:/stressdisk] boot:map[children:map[kernel:/usr/local/src/nanos/output/platform/pc/bin/kernel.img]] targetRoot: klibHostDir:/usr/local/src/nanos/output/klib/bin} booting /home/rinor/.ops/images/stressdisk ... qemu-system-x86_64 -machine q35 -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x3 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x3.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x3.0x2 -device virtio-scsi-pci,bus=pci.2,addr=0x0,id=scsi0 -device scsi-hd,bus=scsi0.0,drive=hd0 -vga none -smp 1 -device isa-debug-exit -m 2G -device virtio-rng-pci -device scsi-hd,bus=scsi0.0,drive=hd1 -machine accel=kvm:tcg -cpu host -no-reboot -cpu max -drive file=/home/rinor/.ops/images/stressdisk,format=raw,if=none,id=hd0 -drive file=/home/rinor/.ops/volumes/stressdisk:dec2875e-6df6-21c8-60b0-cb8552ce758f.raw,format=raw,if=none,id=hd1 -device virtio-net,bus=pci.3,addr=0x0,netdev=n0,mac=f6:e3:67:49:92:4f -netdev user,id=n0 -display none -serial stdio en1: assigned 10.0.2.15 2023/05/03 16:00:03 2023/05/03 16:00:03 STEP(1): Checking for existing data on (/tfs_mnt) 2023/05/03 16:00:03 2023/05/03 16:00:03 Block Size : 512B => 2048 blocks/MiB 2023/05/03 16:00:03 Blocks Total: 65536 Blocks 2023/05/03 16:00:03 Blocks Free : 64511 Blocks 2023/05/03 16:00:03 Blocks Avail: 64511 Blocks 2023/05/03 16:00:03 Files : 0 2023/05/03 16:00:03 Ffree : 0 2023/05/03 16:00:03 Fsid : [5632 -32576] 2023/05/03 16:00:03 Namelen : 255 2023/05/03 16:00:03 Frsize : 512 2023/05/03 16:00:03 Flags : 0 2023/05/03 16:00:03 Spare : [0 0 0 0] 2023/05/03 16:00:03 2023/05/03 16:00:03 2023/05/03 16:00:03 STEP(2): Trying to write (4194304 bytes => 8192 blocks)/file on (/tfs_mnt) 2023/05/03 16:00:03 2023/05/03 16:00:03 Block Size : 512B => 2048 blocks/MiB 2023/05/03 16:00:03 Blocks Total: 65536 Blocks 2023/05/03 16:00:03 Blocks Free : 64511 Blocks 2023/05/03 16:00:03 Blocks Avail: 64511 Blocks 2023/05/03 16:00:03 Files : 0 2023/05/03 16:00:03 Ffree : 0 2023/05/03 16:00:03 Fsid : [5632 -32576] 2023/05/03 16:00:03 Namelen : 255 2023/05/03 16:00:03 Frsize : 512 2023/05/03 16:00:03 Flags : 0 2023/05/03 16:00:03 Spare : [0 0 0 0] 2023/05/03 16:00:03 2023/05/03 16:00:03 STEP(2): Approx (7) file(s) expected to succeed, (1) additional file(s) may be attempted 2023/05/03 16:00:03 STEP(2): /tfs_mnt/TST_0001_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(2): /tfs_mnt/TST_0002_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(2): /tfs_mnt/TST_0003_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(2): /tfs_mnt/TST_0004_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(2): /tfs_mnt/TST_0005_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(2): /tfs_mnt/TST_0006_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(2): /tfs_mnt/TST_0007_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(2): /tfs_mnt/TST_0008_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK, SHOULD HAVE FAILED 2023/05/03 16:00:03 2023/05/03 16:00:03 STEP(3): Read files and calculate checksum on (/tfs_mnt) 2023/05/03 16:00:03 2023/05/03 16:00:03 Block Size : 512B => 2048 blocks/MiB 2023/05/03 16:00:03 Blocks Total: 65536 Blocks 2023/05/03 16:00:03 Blocks Free : 1023 Blocks 2023/05/03 16:00:03 Blocks Avail: 1023 Blocks 2023/05/03 16:00:03 Files : 0 2023/05/03 16:00:03 Ffree : 0 2023/05/03 16:00:03 Fsid : [5632 -32576] 2023/05/03 16:00:03 Namelen : 255 2023/05/03 16:00:03 Frsize : 512 2023/05/03 16:00:03 Flags : 0 2023/05/03 16:00:03 Spare : [0 0 0 0] 2023/05/03 16:00:03 2023/05/03 16:00:03 STEP(3): /tfs_mnt/TST_0001_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(3): /tfs_mnt/TST_0002_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(3): /tfs_mnt/TST_0003_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(3): /tfs_mnt/TST_0004_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(3): /tfs_mnt/TST_0005_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(3): /tfs_mnt/TST_0006_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(3): /tfs_mnt/TST_0007_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 STEP(3): /tfs_mnt/TST_0008_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:03 2023/05/03 16:00:03 DONE - terminating in 10 seconds en1: assigned FE80::F4E3:67FF:FE49:924F 2023/05/03 16:00:13 DONE - terminating exit status 1 $ $ ### We run it again to verify the existing data $ $ ops run -c config.json stressdisk -f booting /home/rinor/.ops/images/stressdisk ... en1: assigned 10.0.2.15 2023/05/03 16:00:17 2023/05/03 16:00:17 STEP(1): Checking for existing data on (/tfs_mnt) 2023/05/03 16:00:17 2023/05/03 16:00:17 Block Size : 512B => 2048 blocks/MiB 2023/05/03 16:00:17 Blocks Total: 65536 Blocks 2023/05/03 16:00:17 Blocks Free : 3071 Blocks 2023/05/03 16:00:17 Blocks Avail: 3071 Blocks 2023/05/03 16:00:17 Files : 0 2023/05/03 16:00:17 Ffree : 0 2023/05/03 16:00:17 Fsid : [5632 -32576] 2023/05/03 16:00:17 Namelen : 255 2023/05/03 16:00:17 Frsize : 512 2023/05/03 16:00:17 Flags : 0 2023/05/03 16:00:17 Spare : [0 0 0 0] 2023/05/03 16:00:17 2023/05/03 16:00:17 STEP(1): /tfs_mnt/TST_0001_7606419ce969ca8a6c29457002bfbc4981737f1a - 6348514e8890b3e88bd58d3d9bd91c76d4275b4a (4194304 B) - CORRUPTED 2023/05/03 16:00:17 STEP(1): /tfs_mnt/TST_0002_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:17 STEP(1): /tfs_mnt/TST_0003_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:17 STEP(1): /tfs_mnt/TST_0004_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:17 STEP(1): /tfs_mnt/TST_0005_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:17 STEP(1): /tfs_mnt/TST_0006_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:17 STEP(1): /tfs_mnt/TST_0007_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:17 STEP(1): /tfs_mnt/TST_0008_7606419ce969ca8a6c29457002bfbc4981737f1a - 7606419ce969ca8a6c29457002bfbc4981737f1a (4194304 B) - OK 2023/05/03 16:00:17 .......... .......... ```

In here I'm posting the test result for 4MiB case, since all other cases (2, 4, 8, 16, ...)MiB behave the same.

With 1MiB files there is no issue at all
with values that are != power of 2 (i.e: 3,5,6,7,9,10,...)MiB the corruption of data happens in some cases even without forcing the test app to fill the storage.
it looks like some alignment issue(s), but need help to verify that it's not coming from the test environment (qemu)

francescolavra commented 1 year ago

The test app code and the test steps you described seem correct to me, so there may indeed be some corruption issues. We'll take a look.

rinor commented 1 year ago

Not sure if it's related, but when building/running nanos with TLOG_DEBUG_DUMP defined there is an assertion failing (the same fails when running images)

assertion b->start + offset <= b->length failed at /usr/local/src/nanos/src/runtime/buffer.h:31  in buffer_ref(); halt

TLOG_DEBUG_DUMP defined

```diff diff --git a/src/tfs/tlog.c b/src/tfs/tlog.c index 0857505c..3b031e35 100644 --- a/src/tfs/tlog.c +++ b/src/tfs/tlog.c @@ -5,7 +5,7 @@ #endif //#define TLOG_DEBUG -//#define TLOG_DEBUG_DUMP +#define TLOG_DEBUG_DUMP #ifdef TLOG_DEBUG #ifdef KERNEL #define tlog_debug(x, ...) do {tprintf(sym(tlog), 0, x, ##__VA_ARGS__);} while(0) ``` ```sh make[2]: Leaving directory '/usr/local/src/nanos/test/runtime' MKFS image staging contains: 00000000: 4e 56 4d 54 46 53 04 01 25 b5 83 ad 6a ba 0b f5 00000010: ac ef cb ea c7 bb cf f8 00 05 88 00 88 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 assertion b->start + offset <= b->length failed at /usr/local/src/nanos/src/runtime/buffer.h:31 in buffer_ref(); halt make[1]: *** [Makefile:300: image] Error 2 make[1]: Leaving directory '/usr/local/src/nanos/platform/pc' make: *** [Makefile:60: image] Error 2 ```

francescolavra commented 1 year ago

I reproduced the issue and found its cause. Will be fixed in https://github.com/nanovms/nanos/pull/1862. As for the assert failure when TLOG_DEBUG_DUMP is defined, it's unrelated to this issue, and is due to the debug code that has not been updated when we changed the implementation of the TFS log.

rinor commented 1 year ago

ok, tested that pr and so far all good. All scenarios that previously failed, now behave correctly.

rinor commented 1 year ago

The next observation was about abnormal memory usage (possible memory leak) under some heavy r/w file ops. Will test https://github.com/nanovms/nanos/pull/1861 since it looks like it's related.

rinor commented 1 year ago

How prevent/protect data corruption and kernel crash in the following test case:
- total vm memory: 512MiB
- operational filesize: 128MiB
- action: write as fast as you can without explicitely waiting for the data to be flushed on disk

Is there a way for the kernel (+fs) to handle this safer/better?

OPS config

```sh $ ops volume create stressdisk -s 512M ``` ```json { "BaseVolumeSz": "64m", "Args": [ "-w=10", "-f=1", "-s=128", "-t=0", "-c=false", "-d=/tfs_mnt" ], "Mounts": { "stressdisk": "tfs_mnt" }, "RunConfig": { "Memory": "512m" } } ``` ```sh $ ops run -c config.json stressdisk -f ```

Test Result corrupted/pagecache crash

``` running local instance booting /home/rinor/.ops/images/stressdisk ... en1: assigned 10.0.2.15 2023/07/14 11:46:56 /proc/meminfo MemTotal: 519248 kB MemFree: 466712 kB MemAvailable: 468216 kB Cached: 1504 kB 2023/07/14 11:46:56 WAITING - 3 seconds ... en1: assigned FE80::E040:32FF:FEA8:A423 2023/07/14 11:46:59 WAITING - done 2023/07/14 11:46:59 2023/07/14 11:46:59 STEP(1): Checking for existing data on (/tfs_mnt) 2023/07/14 11:46:59 2023/07/14 11:46:59 2023/07/14 11:46:59 2023/07/14 11:46:59 STEP(2): Trying to write (134217728 bytes => 262144 blocks)/file on (/tfs_mnt) 2023/07/14 11:46:59 2023/07/14 11:47:00 2023/07/14 11:47:00 STEP(2): Approx (3) file(s) expected to succeed, (1) additional file(s) may be attempted 2023/07/14 11:47:00 STEP(2): /tfs_mnt/TST_0001_749486e837b356d3f5af2610404e537a55285454 - 749486e837b356d3f5af2610404e537a55285454 (134217728 B) - OK 2023/07/14 11:47:00 /proc/meminfo MemTotal: 519248 kB MemFree: 196468 kB MemAvailable: 329044 kB Cached: 132576 kB 2023/07/14 11:47:00 STEP(2): /tfs_mnt/TST_0002_749486e837b356d3f5af2610404e537a55285454 - 749486e837b356d3f5af2610404e537a55285454 (134217728 B) - OK 2023/07/14 11:47:00 /proc/meminfo MemTotal: 519248 kB MemFree: 59984 kB MemAvailable: 322128 kB Cached: 262144 kB objcache_addpage error: unable to allocate page 2023/07/14 11:47:00 STEP(2): write /tfs_mnt/TST_0003_749486e837b356d3f5af2610404e537a55285454: input/output error 2023/07/14 11:47:00 /proc/meminfo MemTotal: 519248 kB MemFree: 59980 kB MemAvailable: 322124 kB Cached: 262144 kB 2023/07/14 11:47:00 2023/07/14 11:47:00 STEP(3): Read files and calculate checksum on (/tfs_mnt) 2023/07/14 11:47:00 2023/07/14 11:47:00 new_zeroed_pages error: cannot get physical page; OOM pending_fault_complete: page fill failed with (result:out of memory) 2023/07/14 11:47:00 STEP(3): read /tfs_mnt/TST_0001_749486e837b356d3f5af2610404e537a55285454: bad address 2023/07/14 11:47:00 /proc/meminfo MemTotal: 519248 kB MemFree: 4752 kB MemAvailable: 266896 kB Cached: 262144 kB new_zeroed_pages error: cannot get physical page; OOM pending_fault_complete: page fill failed with (result:out of memory) new_zeroed_pages error: cannot get physical page; OOM pending_fault_complete: page fill failed with (result:out of memory) signal 11 (no core generated: limit 0) objcache_addpage error: unable to allocate page frame trace: ffffc00000a37890: ffffffff8004ffd8 (pagecache_sync_volume + 00000000000002b8/00000000000002c2) ffffc00000a37910: ffffffff80057d35 (storage_sync + 0000000000000095/00000000000000ef) ffffc00000a37950: ffffffff80040a03 (kernel_shutdown + 0000000000000163/000000000000041c) ffffc00000a379b0: ffffffff800414e6 ffffc00000a37cb0: ffffffff800c107b (after_dump_halt + 000000000000008b/000000000000014f) ffffc00000a37d30: ffffffff800a1a9a (coredump + 00000000000002ca/0000000000000c18) ffffc00000a37df0: ffffffff800c124f (default_signal_action + 000000000000008f/00000000000002b9) ffffc00000a37e20: ffffffff800c3af2 (dispatch_signals + 0000000000000332/000000000000035d) ffffc00000a37f10: ffffffff800df5d7 (thread_return + 0000000000000117/00000000000004e7) ffffc00000a37f40: ffffffff80053f83 (runloop_internal + 00000000000004b3/0000000000000b2b) ffffc00000a37fc0: ffffffff8004155f (context_switch_finish + 000000000000006f/00000000000001b4) loaded klibs: assertion c != INVALID_ADDRESS failed at /usr/local/src/nanos/src/kernel/pagecache.c:234 (IP 0xffffffff800494d0) in enqueue_page_completion_statelocked(); halt ### Ctrl+ ^Cqemu-system-x86_64:signal: killed $ ``` ``` $ ### We run it again to verify the written data it at least intact $ ### in some cases the data are lost completely $ ### in other cases the data are corrupted ``` ``` $ running local instance booting /home/rinor/.ops/images/stressdisk ... en1: assigned 10.0.2.15 2023/07/14 11:52:02 /proc/meminfo MemTotal: 519248 kB MemFree: 466712 kB MemAvailable: 468216 kB Cached: 1504 kB 2023/07/14 11:52:02 WAITING - 3 seconds ... en1: assigned FE80::A495:BDFF:FEF1:4852 2023/07/14 11:52:05 WAITING - done 2023/07/14 11:52:05 2023/07/14 11:52:05 STEP(1): Checking for existing data on (/tfs_mnt) 2023/07/14 11:52:05 2023/07/14 11:52:05 #### the data are lost completely in this case 2023/07/14 11:52:05 2023/07/14 11:52:05 STEP(2): Trying to write (134217728 bytes => 262144 blocks)/file on (/tfs_mnt) 2023/07/14 11:52:05 2023/07/14 11:52:05 2023/07/14 11:52:05 STEP(2): Approx (3) file(s) expected to succeed, (1) additional file(s) may be attempted 2023/07/14 11:52:06 STEP(2): /tfs_mnt/TST_0001_27f1f69aba5e2ca48baa5e7bc574a6def26c5dea - 27f1f69aba5e2ca48baa5e7bc574a6def26c5dea (134217728 B) - OK 2023/07/14 11:52:06 /proc/meminfo MemTotal: 519248 kB MemFree: 196452 kB MemAvailable: 329028 kB Cached: 132576 kB 2023/07/14 11:52:06 STEP(2): /tfs_mnt/TST_0002_27f1f69aba5e2ca48baa5e7bc574a6def26c5dea - 27f1f69aba5e2ca48baa5e7bc574a6def26c5dea (134217728 B) - OK 2023/07/14 11:52:06 /proc/meminfo MemTotal: 519248 kB MemFree: 59968 kB MemAvailable: 322112 kB Cached: 262144 kB objcache_addpage error: unable to allocate page 2023/07/14 11:52:06 STEP(2): write /tfs_mnt/TST_0003_27f1f69aba5e2ca48baa5e7bc574a6def26c5dea: input/output error 2023/07/14 11:52:06 /proc/meminfo MemTotal: 519248 kB MemFree: 59964 kB MemAvailable: 322108 kB Cached: 262144 kB 2023/07/14 11:52:06 2023/07/14 11:52:06 STEP(3): Read files and calculate checksum on (/tfs_mnt) 2023/07/14 11:52:06 2023/07/14 11:52:06 new_zeroed_pages error: cannot get physical page; OOM pending_fault_complete: page fill failed with (result:out of memory) 2023/07/14 11:52:06 STEP(3): read /tfs_mnt/TST_0001_27f1f69aba5e2ca48baa5e7bc574a6def26c5dea: bad address 2023/07/14 11:52:06 /proc/meminfo MemTotal: 519248 kB MemFree: 4888 kB MemAvailable: 267032 kB Cached: 262144 kB 2023/07/14 11:52:06 STEP(3): /tfs_mnt/TST_0002_27f1f69aba5e2ca48baa5e7bc574a6def26c5dea - 27f1f69aba5e2ca48baa5e7bc574a6def26c5dea (134217728 B) - OK 2023/07/14 11:52:06 /proc/meminfo MemTotal: 519248 kB MemFree: 4608 kB MemAvailable: 266752 kB Cached: 262144 kB 2023/07/14 11:52:06 2023/07/14 11:52:06 DONE - terminating in 10 seconds objcache_addpage error: unable to allocate page frame trace: ffffc00000a37e80: ffffffff8004aa9a (pagecache_scan_timer + 00000000000002ca/00000000000002cf) ffffc00000a37ef0: ffffffff8008114a (timer_service + 00000000000000ea/000000000000036b) ffffc00000a37f40: ffffffff80053c34 (runloop_internal + 0000000000000164/0000000000000b2b) ffffc00000a37fc0: ffffffff8004155f (context_switch_finish + 000000000000006f/00000000000001b4) loaded klibs: assertion c != INVALID_ADDRESS failed at /usr/local/src/nanos/src/kernel/pagecache.c:234 (IP 0xffffffff800494d0) in enqueue_page_completion_statelocked(); halt ^Cqemu-system-x86_64: terminating on signal 2 signal: killed $ ``` ``` $ ### run it again to check what is the state of the writted data $ ### in this case we will find the data corrupted ``` ``` $ running local instance booting /home/rinor/.ops/images/stressdisk ... en1: assigned 10.0.2.15 2023/07/14 11:55:54 /proc/meminfo MemTotal: 519248 kB MemFree: 466712 kB MemAvailable: 468216 kB Cached: 1504 kB 2023/07/14 11:55:54 WAITING - 3 seconds ... en1: assigned FE80::98ED:D2FF:FE5B:F7EC 2023/07/14 11:55:57 WAITING - done 2023/07/14 11:55:57 2023/07/14 11:55:57 STEP(1): Checking for existing data on (/tfs_mnt) 2023/07/14 11:55:57 2023/07/14 11:55:57 2023/07/14 11:55:57 STEP(1): /tfs_mnt/TST_0001_27f1f69aba5e2ca48baa5e7bc574a6def26c5dea - ba713b819c1202dcb0d178df9d2b3222ba1bba44 (134217728 B) - CORRUPTED 2023/07/14 11:55:57 /proc/meminfo MemTotal: 519248 kB MemFree: 194312 kB MemAvailable: 195816 kB Cached: 1504 kB 2023/07/14 11:55:57 STEP(1): /tfs_mnt/TST_0002_27f1f69aba5e2ca48baa5e7bc574a6def26c5dea - ba713b819c1202dcb0d178df9d2b3222ba1bba44 (134217728 B) - CORRUPTED 2023/07/14 11:55:57 /proc/meminfo MemTotal: 519248 kB MemFree: 194000 kB MemAvailable: 195504 kB Cached: 1504 kB 2023/07/14 11:55:57 2023/07/14 11:55:57 STEP(2): Trying to write (134217728 bytes => 262144 blocks)/file on (/tfs_mnt) 2023/07/14 11:55:57 2023/07/14 11:55:58 2023/07/14 11:55:58 STEP(2): Approx (3) file(s) expected to succeed, (1) additional file(s) may be attempted 2023/07/14 11:55:58 STEP(2): /tfs_mnt/TST_0001_227da3450d06c86fe283ba97b893c43c1866250a - 227da3450d06c86fe283ba97b893c43c1866250a (134217728 B) - OK 2023/07/14 11:55:58 /proc/meminfo MemTotal: 519248 kB MemFree: 193844 kB MemAvailable: 326420 kB Cached: 132576 kB 2023/07/14 11:55:58 STEP(2): /tfs_mnt/TST_0002_227da3450d06c86fe283ba97b893c43c1866250a - 227da3450d06c86fe283ba97b893c43c1866250a (134217728 B) - OK 2023/07/14 11:55:58 /proc/meminfo MemTotal: 519248 kB MemFree: 57356 kB MemAvailable: 319500 kB Cached: 262144 kB objcache_addpage error: unable to allocate page 2023/07/14 11:55:58 STEP(2): write /tfs_mnt/TST_0003_227da3450d06c86fe283ba97b893c43c1866250a: input/output error 2023/07/14 11:55:59 /proc/meminfo MemTotal: 519248 kB MemFree: 57356 kB MemAvailable: 319500 kB Cached: 262144 kB 2023/07/14 11:55:59 2023/07/14 11:55:59 STEP(3): Read files and calculate checksum on (/tfs_mnt) 2023/07/14 11:55:59 2023/07/14 11:55:59 new_zeroed_pages error: cannot get physical page; OOM pending_fault_complete: page fill failed with (result:out of memory) unexpected fault address 0xc00b0ce000 fatal error: fault [signal SIGBUS: bus error code=0x2 addr=0xc00b0ce000 pc=0x460ef9] .... ```

francescolavra commented 1 year ago

WIth #1904, the above data corruption and kernel crash issues are fixed, and the application can write as fast as it can without waiting for data to be flushed to disk.

rinor commented 1 year ago

WIth #1904, the above data corruption and kernel crash issues are fixed, and the application can write as fast as it can without waiting for data to be flushed to disk.

Testing that pr and now it's much better. However while testing, randomly (have yet to identify a pattern and exclude test env fault) got some failures. Just posting so we have them here.

case 1

```sh running local instance booting /home/rinor/.ops/images/stressdisk ... en1: assigned 10.0.2.15 2023/07/28 08:54:11 /proc/meminfo MemTotal: 519240 kB MemFree: 466704 kB MemAvailable: 468208 kB Cached: 1504 kB 2023/07/28 08:54:11 WAITING - 3 seconds ... en1: assigned FE80::86F:BFF:FE87:178A 2023/07/28 08:54:14 WAITING - done 2023/07/28 08:54:14 2023/07/28 08:54:14 STEP(1): Checking for existing data on (/tfs_mnt) 2023/07/28 08:54:14 2023/07/28 08:54:14 2023/07/28 08:54:14 2023/07/28 08:54:14 STEP(2): Trying to write (134217728 bytes => 262144 blocks)/file on (/tfs_mnt) 2023/07/28 08:54:14 2023/07/28 08:54:15 2023/07/28 08:54:15 STEP(2): Approx (31) file(s) expected to succeed, (1) additional file(s) may be attempted 2023/07/28 08:54:15 STEP(2): /tfs_mnt/TST_0001_ed91f57dcd5518e386868e693e421917e5c0c6a8 - ed91f57dcd5518e386868e693e421917e5c0c6a8 (134217728 B) - OK 2023/07/28 08:54:15 /proc/meminfo MemTotal: 519240 kB MemFree: 196460 kB MemAvailable: 329036 kB Cached: 132576 kB 2023/07/28 08:54:15 STEP(2): /tfs_mnt/TST_0002_ed91f57dcd5518e386868e693e421917e5c0c6a8 - ed91f57dcd5518e386868e693e421917e5c0c6a8 (134217728 B) - OK 2023/07/28 08:54:15 /proc/meminfo MemTotal: 519240 kB MemFree: 59976 kB MemAvailable: 322120 kB Cached: 262144 kB 2023/07/28 08:54:21 STEP(2): /tfs_mnt/TST_0003_ed91f57dcd5518e386868e693e421917e5c0c6a8 - ed91f57dcd5518e386868e693e421917e5c0c6a8 (134217728 B) - OK 2023/07/28 08:54:21 /proc/meminfo MemTotal: 519240 kB MemFree: 65812 kB MemAvailable: 225972 kB Cached: 160160 kB 2023/07/28 08:54:21 STEP(2): /tfs_mnt/TST_0004_ed91f57dcd5518e386868e693e421917e5c0c6a8 - ed91f57dcd5518e386868e693e421917e5c0c6a8 (134217728 B) - OK 2023/07/28 08:54:21 /proc/meminfo MemTotal: 519240 kB MemFree: 66364 kB MemAvailable: 309488 kB Cached: 243124 kB 2023/07/28 08:54:22 STEP(2): /tfs_mnt/TST_0005_ed91f57dcd5518e386868e693e421917e5c0c6a8 - ed91f57dcd5518e386868e693e421917e5c0c6a8 (134217728 B) - OK 2023/07/28 08:54:22 /proc/meminfo MemTotal: 519240 kB MemFree: 65852 kB MemAvailable: 268964 kB Cached: 203112 kB 2023/07/28 08:54:22 STEP(2): /tfs_mnt/TST_0006_ed91f57dcd5518e386868e693e421917e5c0c6a8 - ed91f57dcd5518e386868e693e421917e5c0c6a8 (134217728 B) - OK 2023/07/28 08:54:22 /proc/meminfo MemTotal: 519240 kB MemFree: 55492 kB MemAvailable: 317636 kB Cached: 262144 kB 2023/07/28 08:54:31 STEP(2): /tfs_mnt/TST_0007_ed91f57dcd5518e386868e693e421917e5c0c6a8 - ed91f57dcd5518e386868e693e421917e5c0c6a8 (134217728 B) - OK 2023/07/28 08:54:31 /proc/meminfo MemTotal: 519240 kB MemFree: 65736 kB MemAvailable: 227812 kB Cached: 162076 kB 2023/07/28 08:54:31 STEP(2): /tfs_mnt/TST_0008_ed91f57dcd5518e386868e693e421917e5c0c6a8 - ed91f57dcd5518e386868e693e421917e5c0c6a8 (134217728 B) - OK 2023/07/28 08:54:31 /proc/meminfo MemTotal: 519240 kB MemFree: 65728 kB MemAvailable: 311676 kB Cached: 245948 kB 2023/07/28 08:54:32 STEP(2): /tfs_mnt/TST_0009_ed91f57dcd5518e386868e693e421917e5c0c6a8 - ed91f57dcd5518e386868e693e421917e5c0c6a8 (134217728 B) - OK 2023/07/28 08:54:32 /proc/meminfo MemTotal: 519240 kB MemFree: 55488 kB MemAvailable: 317632 kB Cached: 262144 kB 2023/07/28 08:54:39 STEP(2): /tfs_mnt/TST_0010_ed91f57dcd5518e386868e693e421917e5c0c6a8 - ed91f57dcd5518e386868e693e421917e5c0c6a8 (134217728 B) - OK 2023/07/28 08:54:39 /proc/meminfo MemTotal: 519240 kB MemFree: 65720 kB MemAvailable: 219080 kB Cached: 153360 kB 2023/07/28 08:54:40 STEP(2): /tfs_mnt/TST_0011_ed91f57dcd5518e386868e693e421917e5c0c6a8 - ed91f57dcd5518e386868e693e421917e5c0c6a8 (134217728 B) - OK 2023/07/28 08:54:40 /proc/meminfo MemTotal: 519240 kB MemFree: 53308 kB MemAvailable: 315452 kB Cached: 262144 kB 2023/07/28 08:54:45 STEP(2): /tfs_mnt/TST_0012_ed91f57dcd5518e386868e693e421917e5c0c6a8 - ed91f57dcd5518e386868e693e421917e5c0c6a8 (134217728 B) - OK 2023/07/28 08:54:45 /proc/meminfo MemTotal: 519240 kB MemFree: 65720 kB MemAvailable: 217556 kB Cached: 151836 kB 2023/07/28 08:54:46 STEP(2): /tfs_mnt/TST_0013_ed91f57dcd5518e386868e693e421917e5c0c6a8 - ed91f57dcd5518e386868e693e421917e5c0c6a8 (134217728 B) - OK 2023/07/28 08:54:46 /proc/meminfo MemTotal: 519240 kB MemFree: 55336 kB MemAvailable: 317480 kB Cached: 262144 kB no fault handler cpu 0000000000000000, state kernel, vector 000000000000000e lastvector: 000000000000000e (Page fault) frame: ffffc00000a30000 type: kernel active_cpu: 0000000000000000 stack top: ffffc00000a37ff0 error code: 0000000000000000 address: 0000000000000220 rax: 0000000000000220 rbx: ffff80c0000021c0 rcx: 0000000000000010 rdx: 0000000000000000 rsi: ffffc00000a37ec0 rdi: ffff80c0000021c0 rbp: ffffc00000a37ea8 rsp: ffffc00000a37e88 r8: 0000000000000008 r9: ffffc000007fffe0 r10: 0000000000000008 r11: 0000000000000010 r12: ffff80c0000021c0 r13: 0000000000000000 r14: ffffc00000a37ec0 r15: ffffc00000b00000 rip: ffffffff80082de0 (iterate + 0000000000000050/00000000000000b8) rflags: 0000000000010002 ss: 0000000000000008 cs: 0000000000000010 ds: 0000000000000000 es: 0000000000000000 fsbase: 0000000000000000 gsbase: 0000000000000000 frame trace: ffffc00000a37eb0: ffffffff80083a41 (destruct_tuple + 0000000000000051/00000000000000af) ffffc00000a37f20: ffffffff8003fa79 (mm_service_sync + 0000000000000019/0000000000000099) ffffc00000a37f40: ffffffff8005490a (runloop_internal + 000000000000021a/0000000000000b2d) ffffc00000a37fc0: ffffffff800416df (context_switch_finish + 000000000000006f/00000000000001b4) loaded klibs: stack trace: ffffc00000a37e88: ffff80c0000021c0 ffffc00000a37e90: 0000000000000001 ffffc00000a37e98: 00000000000097b2 ffffc00000a37ea0: 0000000000000092 ffffc00000a37ea8: ffffc00000a37f18 ffffc00000a37eb0: ffffffff80083a41 (destruct_tuple + 0000000000000051/00000000000000af) ffffc00000a37eb8: ffffc00000661540 ffffc00000a37ec0: ffffffff80083aa0 (destruct_tuple_each + 0000000000000000/0000000000000075) ffffc00000a37ec8: ffffffff801583f2 ffffc00000a37ed0: 0000000000000000 ffffc00000a37ed8: 0000000000000030 ffffc00000a37ee0: ffff80c0000021c0 ffffc00000a37ee8: ffffc00000a37f01 ffffc00000a37ef0: ffffffff8006de76 (mcache_dealloc + 0000000000000036/000000000000031d) ffffc00000a37ef8: ffffc00000200000 ffffc00000a37f00: ca629939779ae13a ffffc00000a37f08: ffffc0000067e100 ffffc00000a37f10: 0000000000000001 ffffc00000a37f18: ffffc00000a37f38 ffffc00000a37f20: ffffffff8003fa79 (mm_service_sync + 0000000000000019/0000000000000099) ffffc00000a37f28: 00000000000097b2 ffffc00000a37f30: ffffc0000067e100 ffffc00000a37f38: ffffc00000a37fb8 ffffc00000a37f40: ffffffff8005490a (runloop_internal + 000000000000021a/0000000000000b2d) ffffc00000a37f48: 0000000000000000 ffffc00000a37f50: 0000000000000000 ffffc00000a37f58: 0000000000000000 ffffc00000a37f60: ffffc00000099000 ffffc00000a37f68: 0000000000000000 ffffc00000a37f70: ffffc0000067e100 ffffc00000a37f78: ffff80c0000021c0 ffffc00000a37f80: ca629939779ae13a exit status 253 ```

francescolavra commented 1 year ago

@rinor I added a commit in #1904 that fixes the latest issue you found. Thanks

rinor commented 1 year ago

@rinor I added a commit in #1904 that fixes the latest issue you found. Thanks

Much better now. Questions:

what would be the cause of pagecache_commit_dirty_ranges error: sgbuf alloc fail? It happens when writing data and sometimes it causes page faults

pagecache_commit_dirty_ranges error: sgbuf alloc fail

```sh 2023/07/29 17:09:31 STEP(2): /tfs_mnt/TST_0005_d36b969fa1d5110aa518943f82ba6090b3eedbff - d36b969fa1d5110aa518943f82ba6090b3eedbff (536870912 B) - OK 2023/07/29 17:09:31 /proc/meminfo MemTotal: 2092104 kB MemFree: 65772 kB MemAvailable: 1428736 kB Cached: 1362964 kB pagecache_commit_dirty_ranges error: sgbuf alloc fail pagecache_commit_dirty_ranges error: sgbuf alloc fail no fault handler cpu 0000000000000000, state kernel, vector 000000000000000e lastvector: 000000000000000e (Page fault) frame: ffffc00000a30000 type: kernel active_cpu: 0000000000000000 stack top: ffffc00000a37ff0 error code: 0000000000000000 address: 00000000000001b7 rax: 00000000000001b7 rbx: ffff80c000001c00 rcx: 0000000000000010 rdx: 0000000000000000 rsi: ffffc00000a37ec0 rdi: ffff80c000001c00 rbp: ffffc00000a37ea8 rsp: ffffc00000a37e88 r8: 0000000000000008 r9: ffffc000007fffe0 r10: 0000000000000008 r11: 0000000000000010 r12: ffff80c000001c00 r13: 0000000000000000 r14: ffffc00000a37ec0 r15: ffffc00000b00000 rip: ffffffff80082d40 (iterate + 0000000000000050/00000000000000b8) rflags: 0000000000010006 ss: 0000000000000008 cs: 0000000000000010 ds: 0000000000000000 es: 0000000000000000 fsbase: 0000000000000000 gsbase: 0000000000000000 frame trace: ffffc00000a37eb0: ffffffff800839a1 (destruct_tuple + 0000000000000051/00000000000000af) ffffc00000a37f20: ffffffff8003fa79 (mm_service_sync + 0000000000000019/0000000000000099) ffffc00000a37f40: ffffffff8005486a (runloop_internal + 000000000000021a/0000000000000b2d) ffffc00000a37fc0: ffffffff800416df (context_switch_finish + 000000000000006f/00000000000001b4) loaded klibs: stack trace: ffffc00000a37e88: ffff80c000001c00 ffffc00000a37e90: 0000000000000001 ffffc00000a37e98: 00000000000011f4 ffffc00000a37ea0: 0000000000000092 ffffc00000a37ea8: ffffc00000a37f18 ffffc00000a37eb0: ffffffff800839a1 (destruct_tuple + 0000000000000051/00000000000000af) ffffc00000a37eb8: ffffc0000067b280 ffffc00000a37ec0: ffffffff80083a00 (destruct_tuple_each + 0000000000000000/0000000000000075) ffffc00000a37ec8: ffffffff801583f2 ffffc00000a37ed0: 0000000000000000 ffffc00000a37ed8: 0000000000000030 ffffc00000a37ee0: ffff80c000001c00 ffffc00000a37ee8: 0000000000001101 ffffc00000a37ef0: ffffffff8006ddd6 (mcache_dealloc + 0000000000000036/000000000000031d) ffffc00000a37ef8: ffffc00000200000 ffffc00000a37f00: 6462fee5b6af2690 ffffc00000a37f08: ffffc000006607c0 ffffc00000a37f10: 0000000000000001 ffffc00000a37f18: ffffc00000a37f38 ffffc00000a37f20: ffffffff8003fa79 (mm_service_sync + 0000000000000019/0000000000000099) ffffc00000a37f28: 00000000000011f4 ffffc00000a37f30: ffffc000006607c0 ffffc00000a37f38: ffffc00000a37fb8 ffffc00000a37f40: ffffffff8005486a (runloop_internal + 000000000000021a/0000000000000b2d) ffffc00000a37f48: 0000000000000000 ffffc00000a37f50: 0000000000000000 ffffc00000a37f58: 0000000000000000 ffffc00000a37f60: ffffc00000099000 ffffc00000a37f68: 0000000000000000 ffffc00000a37f70: ffffc000006607c0 ffffc00000a37f78: ffff80c000001c00 ffffc00000a37f80: 6462fee5b6af2690 exit status 253 ```

Less of an issue since it happens on reading, but what do you make out of this?

assertion sgb != INVALID_ADDRESS failed

```sh 2023/07/30 12:47:00 STEP(3): /tfs_mnt/TST_0006_87ebde21362f0be0868ecd8052289112161a6b88 - 87ebde21362f0be0868ecd8052289112161a6b88 (268435456 B) - OK 2023/07/30 12:47:00 /proc/meminfo MemTotal: 1043528 kB MemFree: 66516 kB MemAvailable: 167888 kB Cached: 101372 kB 2023/07/30 12:47:01 STEP(3): /tfs_mnt/TST_0007_87ebde21362f0be0868ecd8052289112161a6b88 - 87ebde21362f0be0868ecd8052289112161a6b88 (268435456 B) - OK 2023/07/30 12:47:01 /proc/meminfo MemTotal: 1043528 kB MemFree: 66516 kB MemAvailable: 229756 kB Cached: 163240 kB 2023/07/30 12:47:02 STEP(3): /tfs_mnt/TST_0008_87ebde21362f0be0868ecd8052289112161a6b88 - 87ebde21362f0be0868ecd8052289112161a6b88 (268435456 B) - OK 2023/07/30 12:47:02 /proc/meminfo MemTotal: 1043528 kB MemFree: 66516 kB MemAvailable: 178056 kB Cached: 111540 kB frame trace: ffffc00000b9fd50: ffffffff8004f000 (pagecache_read_sg + 0000000000000290/00000000000008d1) ffffc00000b9ff20: ffffffff800ce53e (file_read + 000000000000023e/0000000000000322) ffffc00000b9ff80: ffffffff800d54ed (read + 000000000000016d/0000000000000247) ffffc00000b9ffb0: ffffffff800dd39c (syscall_handler + 00000000000002dc/0000000000000603) loaded klibs: assertion sgb != INVALID_ADDRESS failed at /usr/local/src/nanos/src/kernel/pagecache.c:1340 (IP 0xffffffff80049a61) in pagecache_read_pp_handler(); halt ```

francescolavra commented 1 year ago

@rinor both errors you reported are due to a failed allocation of buffers that the kernel uses in its internal scatter-gather lists, which can happen when the system is under memory pressure. I modified #1904 again so that these allocation failures are less likely, and if they happen the kernel can recover from them.

rinor commented 1 year ago

I modified #1904 again so that these allocation failures are less likely, and if they happen the kernel can recover from them.

So far all looks good and much more reliable, can't crash/corrupt anything anymore. Thank you.

nanovms / nanos

question(filesystem): TFS status #1845