ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.18k stars 3.02k forks source link

Error: archive/tar: write too long #4667

Open ivan386 opened 6 years ago

ivan386 commented 6 years ago

Version information:

go-ipfs version: 0.4.13- Repo version: 6 System version: amd64/windows Golang version: go1.9.2

Type: Bug

Description:

File: /ipfs/QmWfLgAopxBBpnPsMeQWQhcVfHHEo7natWgtcX5yzw7My3

When get this file from browser: http://127.0.0.1/ipfs/QmWfLgAopxBBpnPsMeQWQhcVfHHEo7natWgtcX5yzw7My3 200 Ok

Result is valid tar archive: test tar.tar.gz (it gziped for upload to GitHub) Blocks of this file: blocks.tar.gz

When get from console:

>ipfs get QmWfLgAopxBBpnPsMeQWQhcVfHHEo7natWgtcX5yzw7My3
Saving file(s) to QmWfLgAopxBBpnPsMeQWQhcVfHHEo7natWgtcX5yzw7My3
 2.28 KB / 2.28 KB [==============================================] 100.00% 0s
Error: archive/tar: write too long

Why "ipfs get" try to read tar?

Additional information:

>ipfs dag get QmWfLgAopxBBpnPsMeQWQhcVfHHEo7natWgtcX5yzw7My3
{"data":"CAIYgBAggAgggAQggAQ=","links":[{"Cid":{"/":"QmbTswUf339CVJsqB6rSeNum7hrnUsNqFbgkth8g3u2chX"},"Name":"","Size":1165},{"Cid":{"/":"zb2rhX9DvypVUCrp1SyoB6JpvdNEGcujJm7RZ6btt1zJBvnKq"},"Name":"","Size":512},{"Cid":{"/":"zb2rhX9DvypVUCrp1SyoB6JpvdNEGcujJm7RZ6btt1zJBvnKq"},"Name":"","Size":512}]}

decode of data field:

"data":{
  "Type": 2,           // It's file
  "filesize": 2048,  // sum of blocksizes
  "blocksizes": [
    1024,               // file header size + file data size + file padding size
    512,                 // padding block size
    512                  // padding block size
  ]
}
>ipfs dag get QmbTswUf339CVJsqB6rSeNum7hrnUsNqFbgkth8g3u2chX
{"data":"CAISgAR0ZXN0IHRhci50eHQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMDEwMDc3NwAwMDAwMDAwADAwMDAwMDAAMDAwMDAwMDAwNDQAMTMyMzY0MDYzMTAAMDExNTM3ACAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAHVzdGFyADAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAwMDAwMDAAMDAwMDAwMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABiACCAkINwD","links":[{"Cid":{"/":"zb2rhfY128Mz2bimXoxi7aZuc5NPQMWaLPJYVSTLoYWs7rsdz"},"Name":"","Size":36},{"Cid":{"/":"zb2rhX9DvypVUCrp1SyoB6JpvdNEGcujJm7RZ6btt1zJBvnKq"},"Name":"","Size":512}]}

decode of data field:

data: {
  "Type": 2,                      // It's file
  "Data": "test tar.txt...",   // 512 bytes tar file header
  "filesize": 1024,             // file header size + file data size + file padding size
  "blocksizes": [
    36,                              // file data size
    476                             // file padding size (it's hack)
  ]
}

zb2rhfY128Mz2bimXoxi7aZuc5NPQMWaLPJYVSTLoYWs7rsdz (36) - file data raw block

>ipfs block get zb2rhfY128Mz2bimXoxi7aZuc5NPQMWaLPJYVSTLoYWs7rsdz
This is "test tar.txt" file content.

zb2rhX9DvypVUCrp1SyoB6JpvdNEGcujJm7RZ6btt1zJBvnKq (512) - padding raw block

> ipfs block get zb2rhX9DvypVUCrp1SyoB6JpvdNEGcujJm7RZ6btt1zJBvnKq
whyrusleeping commented 6 years ago

The reason for the tar message is because we use the tar format to stream the files back to the client. Something in your setup is breaking that somehow...

kevina commented 6 years ago

I can reproduce this on my machine:

$ ipfs get QmWfLgAopxBBpnPsMeQWQhcVfHHEo7natWgtcX5yzw7My3
00:13:28.038 ERROR  cmds/http: archive/tar: write too long responseemitter.go:140
Saving file(s) to QmWfLgAopxBBpnPsMeQWQhcVfHHEo7natWgtcX5yzw7My3
 2.28 KB / 2.28 KB [=================================================================] 100.00% 0s
$ ipfs version --all
go-ipfs version: 0.4.14-dev-b37f856
Repo version: 6
System version: amd64/linux
Golang version: go1.9.2
ivan386 commented 6 years ago

Problem in unixfs/io/pbdagreader.go.

CtxReadFull and WriteTo don't use filesize and blocksizes to limit data from blocks.

Stebalien commented 6 years ago

@ivan386 Ah. You mean that Size() and the actual size differ?

kevina commented 6 years ago

I will look into this.

ivan386 commented 6 years ago

@Stebalien Yes

https://github.com/ipfs/go-ipfs/blob/3106135dd5d081e1c0a49e22a435463f32412eb7/unixfs/io/pbdagreader.go#L169

https://github.com/ipfs/go-ipfs/blob/3106135dd5d081e1c0a49e22a435463f32412eb7/unixfs/io/pbdagreader.go#L171

https://github.com/ipfs/go-ipfs/blob/3106135dd5d081e1c0a49e22a435463f32412eb7/unixfs/io/pbdagreader.go#L204

https://github.com/ipfs/go-ipfs/blob/3106135dd5d081e1c0a49e22a435463f32412eb7/unixfs/io/pbdagreader.go#L206

n can be greater(or less) than dr.pbdata.Blocksizes[i]

dr.offset can be greater than dr.pbdata.GetFilesize()

ivan386 commented 6 years ago

I think need to cut or extend (append zeros) data from dr.buf to pd.pbdata.Blocksizes[i] or dr.pbdata.GetFilesize() if it https://github.com/ipfs/go-ipfs/blob/3106135dd5d081e1c0a49e22a435463f32412eb7/unixfs/io/pbdagreader.go#L60

Stebalien commented 6 years ago

Looks related to https://github.com/ipfs/go-ipfs/issues/4540.

kevina commented 6 years ago

@ivan386 how did you create the dag that corresponds to the hash QmWfLgAopxBBpnPsMeQWQhcVfHHEo7natWgtcX5yzw7My3?

kevina commented 6 years ago

Additional data

$ curl http://gateway.ipfs.io/ipfs/QmWfLgAopxBBpnPsMeQWQhcVfHHEo7natWgtcX5yzw7My3 -o gateway
$ ipfs cat QmWfLgAopxBBpnPsMeQWQhcVfHHEo7natWgtcX5yzw7My3 > cat
$ ls -l
total 8
-rw-r--r-- 1 kevina kevina 2084 Feb  8 14:59 cat
-rw-r--r-- 1 kevina kevina 2048 Feb  8 14:58 gateway
$ cmp cat gateway
cmp: EOF on gateway

cmp compares binary files and it is telling us that cat and gateway are the same for the size of the smaller file

ivan386 commented 6 years ago

@kevina https://github.com/ipfs/go-ipfs/blob/990e4df32e12586f24c187eea47d737dad35ce9d/tar/format.go#L39 I rewrite ImportTar and try to use block with 512 bytes length to fill paddings 476 and 512. Tar with one file from gateway is valid. All paddings at the end. Gateway simple cut the tail. But ipfs get give error. I test later it on multifile tar(QmSX9BdDaBygKvqZ7Tdo9x7iKYZ2YgRdwSEnAiLSYhYD33) and gateway return invalid tar file without error.

kevina commented 6 years ago

@ivan386 can I have the exact steps you used to create the dag corresponding to QmWfLgAopxBBpnPsMeQWQhcVfHHEo7natWgtcX5yzw7My3 and QmSX9BdDaBygKvqZ7Tdo9x7iKYZ2YgRdwSEnAiLSYhYD33. If it is from the command line then the commands you used. If you wrote go code then please provide the code to create those dags. Thanks.

ivan386 commented 6 years ago

@kevina https://github.com/ivan386/go-ipfs/tree/new-tar-import

Test it on tar archive with files sizes less than ipfs block size.

ipfs tar add "tar test.tar"

kevina commented 6 years ago

@ivan386 #4680 should fix this. ipfs get ipfs cat and getting via the gateway now or return the correct size tar and without errors.

jboero commented 5 years ago

This is a general problem, not just with ipfs. I'm seeing cases with special filesystems which don't accurately reflect file sizes in attr. Simple example is /proc/$PID/attr/current which is the SELinux context of a proc. I know it's not something that should be tar'ed regularly, but its' being done. Filesize is listed at zero but there are file contents on a read:

jboero  z600  ~  code  $  ll /proc/8832/attr/current -rw-rw-rw-. 1 jboero jboero 0 Dec 17 16:51 /proc/8832/attr/current jboero  z600  ~  code  $  cat /proc/8832/attr/current unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023⏎

Any archive built with go will fail in my experience. Archives built with tar and libz work fine. Bit of a discrepancy.