varun2784 / weed-fs

Automatically exported from code.google.com/p/weed-fs
0 stars 0 forks source link

Bug - empty collections #54

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
When you assign with /dir/assign it will create an empty collection on 1 volume 
server only.

Step to reproduce:
- start 2 volume servers
- assign a key with /dir/assign (replication=000)
- upload something
- shut down the volume server with assigned volumes.
- /dir/assign again and the master will keep choosing the previously assigned 
volumes on the dead node no matter what.

I wish to have the old behavior if collections are empty or not specified.

Original issue reported on code.google.com by claudiu....@gmail.com on 23 Nov 2013 at 8:48

GoogleCodeExporter commented 8 years ago
The bug description is different from what I am experiencing locally.

1) weed master
2) weed volume -port=8081 -dir=/tmp/1
3) weed volume -port=8082 -dir=/tmp/2

The /tmp/1 and /tmp/2 are empty folders. After the first few /dir/assign

chris@clu-dt2:~$ ls -al /tmp/1
total 44
drwxrwxr-x  2 chris chris  4096 Nov 23 13:12 .
drwxrwxrwt 30 root  root  20480 Nov 23 13:12 ..
-rw-r--r--  1 chris chris     8 Nov 23 13:12 2.dat
-rw-r--r--  1 chris chris     0 Nov 23 13:12 2.idx
-rw-r--r--  1 chris chris     8 Nov 23 13:12 3.dat
-rw-r--r--  1 chris chris     0 Nov 23 13:12 3.idx
-rw-r--r--  1 chris chris     8 Nov 23 13:12 4.dat
-rw-r--r--  1 chris chris     0 Nov 23 13:12 4.idx
-rw-r--r--  1 chris chris     8 Nov 23 13:12 6.dat
-rw-r--r--  1 chris chris     0 Nov 23 13:12 6.idx
chris@clu-dt2:~$ ls -al /tmp/2
total 40
drwxrwxr-x  2 chris chris  4096 Nov 23 13:12 .
drwxrwxrwt 30 root  root  20480 Nov 23 13:15 ..
-rw-r--r--  1 chris chris     8 Nov 23 13:12 1.dat
-rw-r--r--  1 chris chris     0 Nov 23 13:12 1.idx
-rw-r--r--  1 chris chris     8 Nov 23 13:12 5.dat
-rw-r--r--  1 chris chris     0 Nov 23 13:12 5.idx
-rw-r--r--  1 chris chris     8 Nov 23 13:12 7.dat
-rw-r--r--  1 chris chris     0 Nov 23 13:12 7.idx

And if I shutdown any volume server now, the /dir/assign is assigned to the 
other volume server.

Maybe you did not wait until the master conclude the shutdown volume server is 
off and mark it offline?

Original comment by chris...@gmail.com on 23 Nov 2013 at 9:18

GoogleCodeExporter commented 8 years ago
{
  "Topology": {
    "DataCenters": [
      {
        "Free": 11,
        "Max": 16,
        "Racks": [
          {
            "DataNodes": [
              {
                "Free": 8,
                "Max": 8,
                "PublicUrl": "10.16.200.14:9341",
                "Url": "10.16.200.14:9341",
                "Volumes": 0
              },
              {
                "Free": 3,
                "Max": 8,
                "PublicUrl": "10.16.200.13:9341",
                "Url": "10.16.200.13:9341",
                "Volumes": 5
              }
            ],
            "Free": 11,
            "Max": 16
          }
        ]
      }
    ],
    "Free": 11,
    "Max": 16,
    "layouts": [
      {
        "collection": "",
        "replication": "000",
        "writables": [
          3,
          6
        ]
      },
      {
        "collection": "",
        "replication": "001",
        "writables": null
      }
    ]
  },
  "Version": "0.45"
}

There are no errors on logs (debug=true)

Original comment by claudiu....@gmail.com on 25 Nov 2013 at 4:58

GoogleCodeExporter commented 8 years ago
So it means the server 10.16.200.13:9341 is shutdown but the master did not 
notice it?

Original comment by chris...@gmail.com on 25 Nov 2013 at 7:53

GoogleCodeExporter commented 8 years ago
yes, even after couples of days.

Original comment by claudiu....@gmail.com on 25 Nov 2013 at 8:12

GoogleCodeExporter commented 8 years ago
hmm, out of clue now... Does restarting the master help? Is it consistently 
repeatable?

Original comment by chris...@gmail.com on 25 Nov 2013 at 9:46

GoogleCodeExporter commented 8 years ago
Restarting master did not help. However, after wiping all data on all volumes 
and restart everything it seams to work properly. No idea what causes this 
behavior. I will consider this issue closed for now.

Original comment by claudiu....@gmail.com on 25 Nov 2013 at 11:49

GoogleCodeExporter commented 8 years ago
If this happens again, hopefully there are some setup I can reproduce locally. 
Please keep an eye on this.

Original comment by chris...@gmail.com on 25 Nov 2013 at 6:30