lukechampine / user

A CLI renter for Sia
MIT License
12 stars 2 forks source link

Error on resuming upload #7

Closed grigzy28 closed 5 years ago

grigzy28 commented 5 years ago
root@sia-test:~# user upload -m 10 fedora30.tar.xz.50hosts
fedora30.tar.xz.50hosts                                                                                                                                                                                                                        100%   44.17 MB    2.02 MB/s    
root@sia-test:~# cp DOGP.zip DOGP.zip.50hosts
root@sia-test:~# user upload -m 10 DOGP.zip.50hosts
DOGP.zip.50hosts                                                                                                                                                                                                                               38%   217.04 MB   749.8 KB/s    
Upload failed: could not upload to some hosts:
76f9101f: read tcp 192.168.1.4:59218->136.61.3.89:9982: i/o timeout
root@sia-test:~# user upload -m 10 DOGP.zip.50hosts
DOGP.zip.50hosts                                                                                                                                                                                                                               19%   217.04 MB        0 B/s    
Upload failed: file is not writeable

I am going to assume that the file that is not writable is the DOGP.zip.50hosts.usa file. Is that correct?

Could the usa file have remained locked after the error occurred?

As you can see the first upload to the 50hosts with the 44 mb file was successfully completed. The second file is 217 mb and failed with the i/o time error and then the second failure was that the file is not writable. The file was uploaded consecutively after the first had finished.

grigzy28 commented 5 years ago

Deleted the usa file, so it wasn't still locked. Successfully uploaded the third time without error. The second got an i/o timeout after 77% and then the same error of file not writable. When trying to resume it.

lukechampine commented 5 years ago

Looks like the problem lies here: https://github.com/lukechampine/user/blob/f78fc35243c91e88a6bc99a54850be9fbc69b8a7/meta.go#L157

When the file is opened for resuming, the wrong permissions are used -- it has O_APPEND, but it needs O_WRONLY as well.

I should probably add an fs.OpenWriteable method (or some better name), since it's always bugged me that you need to use the fully generic OpenFile method just to reopen a file with write permissions.

lukechampine commented 5 years ago

Should be fixed by https://github.com/lukechampine/user/commit/14f5388acf543776d4f5cb77cf4434ef446b323d. Can you build from master, and close this issue if the problem is resolved?

grigzy28 commented 5 years ago

Downloaded and recompiled, verified it was 14f5388 build.

root@sia-test:~# user upload -m 10 DOGP.zip.test50 
DOGP.zip.test50                                                                                                                                                                                                                                66%   125.83 MB  406.92 MB/s    panic: runtime error: invalid memory address or nil pointer dereference
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5fe64b]

goroutine 1 [running]:
lukechampine.com/us/renter.(*SectorBuilder).Append(0x0, 0xc030c00000, 0x400000, 0x400000, 0x2589f369a29a1673, 0x3800bfa62ea77a67, 0xf98d775f3b12fba3, 0xad0e52a2f70dbca7)
    /root/go/src/lukechampine.com/us/renter/upload.go:50 +0x4b
lukechampine.com/us/renter/renterutil.(*PseudoFS).fillSectors(0xc00006e1e0, 0xc00006e4e0, 0xc0001cc920, 0x415bff)
    /root/go/src/lukechampine.com/us/renter/renterutil/fileops.go:242 +0x6a1
lukechampine.com/us/renter/renterutil.(*PseudoFS).flushSectors(0xc00006e1e0, 0x8a2a80, 0xc00006e230)
    /root/go/src/lukechampine.com/us/renter/renterutil/fileops.go:262 +0x178
lukechampine.com/us/renter/renterutil.(*PseudoFS).Close(0xc00006e1e0, 0x0, 0x0)
    /root/go/src/lukechampine.com/us/renter/renterutil/filesystem.go:279 +0x91
panic(0x8051e0, 0xc08840)
    /usr/local/go/src/runtime/panic.go:522 +0x1b5
lukechampine.com/us/renter.(*SectorBuilder).Append(0x0, 0xc024c00000, 0x400000, 0x400000, 0x2589f369a29a1673, 0x3800bfa62ea77a67, 0xf98d775f3b12fba3, 0xad0e52a2f70dbca7)
    /root/go/src/lukechampine.com/us/renter/upload.go:50 +0x4b
lukechampine.com/us/renter/renterutil.(*PseudoFS).fillSectors(0xc00006e1e0, 0xc00006e4e0, 0xc0001ccf00, 0x0)
    /root/go/src/lukechampine.com/us/renter/renterutil/fileops.go:242 +0x6a1
lukechampine.com/us/renter/renterutil.(*PseudoFS).flushSectors(0xc00006e1e0, 0xc00006e4e0, 0x400000)
    /root/go/src/lukechampine.com/us/renter/renterutil/fileops.go:262 +0x178
lukechampine.com/us/renter/renterutil.(*PseudoFS).fileWriteAt(0xc00006e1e0, 0xc00006e4e0, 0xc019008000, 0x2800000, 0x2800000, 0x5000000, 0xc00006a1c0, 0x0, 0xc0001cd0b0)
    /root/go/src/lukechampine.com/us/renter/renterutil/fileops.go:500 +0x356
lukechampine.com/us/renter/renterutil.(*PseudoFS).fileWrite(0xc00006e1e0, 0xc00006e4e0, 0xc019008000, 0x2800000, 0x2800000, 0xc000000300, 0xc0001cd108, 0x43621c)
    /root/go/src/lukechampine.com/us/renter/renterutil/fileops.go:339 +0x60
lukechampine.com/us/renter/renterutil.PseudoFile.Write(0xc000094260, 0xf, 0x0, 0x401, 0xc00006e1e0, 0xc019008000, 0x2800000, 0x2800000, 0x0, 0x0, ...)
    /root/go/src/lukechampine.com/us/renter/renterutil/filesystem.go:399 +0x23e
main.(*trackWriter).Write(0xc00006e660, 0xc019008000, 0x2800000, 0x2800000, 0x2800000, 0x0, 0x0)
    /root/go/src/lukechampine.com/user/progress.go:33 +0xab
io.copyBuffer(0x928220, 0xc00006e660, 0x928660, 0xc000096090, 0xc019008000, 0x2800000, 0x2800000, 0x0, 0x0, 0xc000032000)
    /usr/local/go/src/io/io.go:404 +0x1fb
io.CopyBuffer(...)
    /usr/local/go/src/io/io.go:375
main.trackUpload(0xc0001d02d0, 0xc000096090, 0x0, 0x2800000)
    /root/go/src/lukechampine.com/user/progress.go:97 +0x43c
main.resumeuploadmetafile(0xc000096090, 0xc00008e180, 0x24, 0xc000094260, 0x13, 0x0, 0x0)
    /root/go/src/lukechampine.com/user/meta.go:166 +0x413
main.main()
    /root/go/src/lukechampine.com/user/main.go:403 +0xe64
root@sia-test:~# 

Got this after it tried to resume an upload at 66% ---- could this error be caused by the following?

root@sia-test:~# user upload -m 10 DOGP.zip.test50 
DOGP.zip.test50                                                                                                                                                                                                                                66%   125.83 MB    2.89 MB/s    DOGP.zip.test50                                                                                                                                                                                                                                66%   125.83 MB    1.57 MB/s    
Upload failed: could not upload to some hosts:
f037506e: contract has insufficient collateral to support modification
root@sia-test:~# user contracts disable f037506e
Disabled contract by removing symlink /root/.config/user/contracts-enabled/f037506e-8fd7e6a3.contract

I had removed a contract due to lack of funds (I guess) during the middle of an upload.

lukechampine commented 5 years ago

ok. Seems like the code is assuming that it still has the contract you disabled. I think the right thing to do is to immediately return a "no contract for host" error. You would then need to migrate the file (in order to replace the missing host with a new one) or delete the metafile and start over with one fewer host.

I pushed a fix for this in https://github.com/lukechampine/us/commit/bd4f47301cab40d625d68862711fb716b1055df7.

grigzy28 commented 5 years ago

Okay, created a 1.2gb file to ensure that the upload would get an error during upload(i/o timeout) and it did. Tried to resume upload at 33% and it continued to 41% where it got another error(out of funds.) So the initial bug is fixed where it can re-open the usa file for resuming upload. The second bug discussed was also corrected by wording saying that it is missing a contract when attempting to upload a partial file with a missing contract.