varun2784 / weed-fs

Automatically exported from code.google.com/p/weed-fs
0 stars 0 forks source link

Could not download uploaded files #26

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
After uploaded a million files, I could not download them, I got the following 
messages in servers:

File Entry Not Found! Needle 1246720000 Memory 69510...
....
File Entry Not Found! Needle 1895825706 Memory 76422...
....
File Entry Not Found! Needle 1929380015 Memory 44888...
....

I used the attached file to upload

Original issue reported on code.google.com by hieu.hcmus@gmail.com on 4 Jul 2013 at 2:22

Attachments:

GoogleCodeExporter commented 8 years ago
Can you do a "ls -al" for the volume server's directory, where the *.dat and 
*.idx files are stored?

And I suppose the disk have enough spaces left, right?

Original comment by chris...@gmail.com on 4 Jul 2013 at 3:55

GoogleCodeExporter commented 8 years ago
yes, there are 4.4Tb free space
please see the uploaded screen-shot

Original comment by hieu.hcmus@gmail.com on 4 Jul 2013 at 4:05

Attachments:

GoogleCodeExporter commented 8 years ago
Looks like some .dat file is exceeding the size limit, 32*1024*1024*1024 = 
34359738368 bytes.

I will need to add one additional check at the volume server level to prevent 
this.

Original comment by chris...@gmail.com on 4 Jul 2013 at 5:01

GoogleCodeExporter commented 8 years ago
checked in a fix just now. I have not tried your test suite. Please run your 
test suite to confirm.

Original comment by chris...@gmail.com on 4 Jul 2013 at 5:16

GoogleCodeExporter commented 8 years ago
I just tested and errors still occurred.

There are no message indicate that the .dat file is exceeding the size limit

Original comment by hieu.hcmus@gmail.com on 4 Jul 2013 at 6:44

GoogleCodeExporter commented 8 years ago
I just checked and only 2 volumes 18 and 21 are possible to download,

All the rest volumes are not possible to download because it read the wrong 
header value.

Original comment by hieu.hcmus@gmail.com on 4 Jul 2013 at 8:28

GoogleCodeExporter commented 8 years ago
There are error occurred when converting between uint32 and uint64:

[2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).write:156) Append offset 
uint32: %!(EXTRA uint32=1565170750)
[2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).write:157) Append offset 
uint64: %!(EXTRA int64=12521366002)
[2013/07/04 16:17:46.242178] [TRAC] (storage.(*Needle).Append:71) Appended 
header: %!(EXTRA []uint8=[101 136 31 154 0 0 0 0 0 80 66 228 0 0 122 221])
[2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).write:166) Write n.Size: 
31453, Needle id: 5260004, Needle cookie%!(EXTRA uint32=1703419802)
[2013/07/04 16:17:46.242178] [TRAC] (main.PostHandler:222) Uploaded file size: 
%!(EXTRA uint32=31391)
[2013/07/04 16:17:46.242178] [TRAC] (main.PostHandler:226) Upload completed
[2013/07/04 16:17:46.242178] [TRAC] (main.GetOrHeadHandler:114) Download: 
/13,5042e465881f9a
[2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).read:197) Volume Id: 13
[2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).read:198) Append offset 
uint32: %!(EXTRA uint32=1565170750)
[2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).read:199) Read offset 
uint64: %!(EXTRA int64=12521366000)
[2013/07/04 16:17:46.242178] [TRAC] (storage.(*Needle).Read:139) Read header: 
%!(EXTRA []uint8=[0 0 101 136 31 154 0 0 0 0 0 80 66 228 0 0]) 

Write: The value in uint32 = 1565170750, uint64 = 12521366002
Read: The value in uint32 = 1565170750, uint64 = 12521366000

Original comment by hieu.hcmus@gmail.com on 4 Jul 2013 at 9:22

GoogleCodeExporter commented 8 years ago
There are error when computing padding size because 12521366002%8 != 0

Original comment by hieu.hcmus@gmail.com on 4 Jul 2013 at 9:31

GoogleCodeExporter commented 8 years ago
We should check the padding value when writing files, I added the below code 
and it works fine:

func (v *Volume) write(n *Needle) (size uint32, err error) {
    if v.readOnly {
        err = fmt.Errorf("%s is read-only", v.dataFile)
        return
    }
    v.accessLock.Lock()
    defer v.accessLock.Unlock()
    var offset int64
    if offset, err = v.dataFile.Seek(0, 2); err != nil {
        return
    }

        //check padding
    if offset % NeedlePaddingSize != 0 {
        offset = offset + (NeedlePaddingSize - offset % NeedlePaddingSize)
        if offset, err = v.dataFile.Seek(offset, 0); err != nil {
            return
        }
    }
        //end

    if size, err = n.Append(v.dataFile, v.Version()); err != nil {
        if e := v.dataFile.Truncate(offset); e != nil {
            err = fmt.Errorf("%s\ncannot truncate %s: %s", err, v.dataFile, e)
        }
        return
    }
    nv, ok := v.nm.Get(n.Id)
    if !ok || int64(nv.Offset)*NeedlePaddingSize < offset {
        logger.LoggerVolume.Trace("Write n.Size: %d, Needle id: %d, Needle cookie", n.Size, n.Id, n.Cookie)
        _, err = v.nm.Put(n.Id, uint32(offset/NeedlePaddingSize), n.Size)
    }
    return
}

Original comment by hieu.hcmus@gmail.com on 4 Jul 2013 at 9:49

GoogleCodeExporter commented 8 years ago
Can you please attach the whole volume.go file which was used to generate logs 
in the comment #7 ?

You fix seems can avoid the problem with 7/8 probability, because a random 
offset has 1/8 chances to pass your test.

Original comment by chris...@gmail.com on 5 Jul 2013 at 7:03

GoogleCodeExporter commented 8 years ago
Please find the attached volume.go file

Original comment by hieu.hcmus@gmail.com on 5 Jul 2013 at 7:11

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks! Was your error output in comment #7 generated after my fix?

My fix is during writing period. So if you are continue to read or write 
existing volumes, you will see errors. 

To use my fix, you would need to clean everything and restart your test from an 
empty system.

Original comment by chris...@gmail.com on 5 Jul 2013 at 7:22

GoogleCodeExporter commented 8 years ago
Hi Chris,

I tested yesterday and the files were not written in the full volumes, I don't 
think your fix can fix this error

Original comment by hieu.hcmus@gmail.com on 5 Jul 2013 at 7:30

GoogleCodeExporter commented 8 years ago
I re-thought about your fix. It can ensure current file are written kind of 
correctly, but it will likely over-write on other existing files.

So we need to ensure when size limit is exceeded, we fail the write attempt and 
ask the user to get another file id from the master.

Original comment by chris...@gmail.com on 5 Jul 2013 at 7:32

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
There are no message "Volume Size Limit %d Exceeded! Current size is %d" in the 
log file

Original comment by hieu.hcmus@gmail.com on 5 Jul 2013 at 7:38

GoogleCodeExporter commented 8 years ago
Hi Chris,
Can yoy please explain: 
"But it will likely over-write on other existing files."

If something wrong with writing/computing padding value of the file(IO 
interrupt...), every later files will be stored wrongly.

I added this code to make sure that if something wrong with one file, it will 
not affect later files  

Original comment by hieu.hcmus@gmail.com on 5 Jul 2013 at 7:50

GoogleCodeExporter commented 8 years ago
I think your guess is right that my fix seems not related to the issue. (but it 
should be OK to leave the fix there)

We need to find out why the offset can be different from what we expected, by 
how much. There are several possibilities:
1. we have an error when writing previous file.
2. the offset returned from v.dataFile.Seek(0, 2) is wrong by a few bytes
3. the offset returned from v.dataFile.Seek(0, 2) is wrong randomly

If it is case 3, we will overwrite existing files.

Can you help to identify which case is causing your problem?

Original comment by chris...@gmail.com on 5 Jul 2013 at 8:06

GoogleCodeExporter commented 8 years ago
Hi, Hieu,

Your fix should be good. The current actual disk writing is done in several 
write() calls. If one of them failed, the offset would be incorrect, making all 
the following files wrong.

It would be helpful to find out what really was wrong in the first place, but 
your fix should be a very good way to prevent all following file read/write 
errors.

Original comment by chris...@gmail.com on 5 Jul 2013 at 11:01

GoogleCodeExporter commented 8 years ago
Checked in the fix to HEAD. Thanks!

If possible, please let me know what was the error that caused the padding 
alignment error.

Original comment by chris...@gmail.com on 5 Jul 2013 at 11:07