qubole / rubix

Cache File System optimized for columnar formats and object stores
Apache License 2.0
183 stars 74 forks source link

Unnecessary empty files are getting created in nodes which is not responsible for those files #107

Closed abhishekdas99 closed 6 years ago

abhishekdas99 commented 6 years ago

Let say we have 3 nodes in the cluster and we have 12 files (File1 to File12). Lets assume consistent hashing algorithm puts File1, File4.. File10 in Node 1, File2, File5.. File11 in Node 2, File3, File6.. File12 in Node 3

So we should only see File1, File4.. File10 and the corresponding mdfiles in Node1.

But we are seeing File2, File3 etc files (which are not supposed be in Node1) are getting created with 0 bytes.

Same thing is happening for mdfiles as well.

From master log, we know out of 2 nodes, 10.0.0.240 is assigned to have store/000000 file

2018-03-22T20:56:26.229Z        INFO    hive-hive-357   com.qubole.rubix.core.CachingFileSystem BlockLocation s3a://bigdata-sets/perf/data/tpcds/parquet/scale_1000/store/000000 0 138942 10.0.0.240 totalHosts: 2

The file and corresponding md file is present in 10.0.0.240 node

[ec2-user@ip-10-0-0-240 ~]$ ls -lrt /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store/000000*
-rw-r--r-- 1 yarn yarn      1 Mar 22 20:56 /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store/000000_mdfile
-rw-rw-rw- 1 yarn yarn 138942 Mar 22 20:56 /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store/000000
[ec2-user@ip-10-0-0-240 ~]$

But if we look at the other node, we also see the same file although the size is zero


[ec2-user@ip-10-0-0-11 ~]$ ls -lrt /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store/000000*
-rw-rw-rw- 1 root root 0 Mar 22 20:56 /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store/000000
-rw-r--r-- 1 yarn yarn 0 Mar 22 20:56 /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store/000000_mdfile
[ec2-user@ip-10-0-0-11 ~]$
abhishekdas99 commented 6 years ago

After the fix:

[ec2-user@ip-10-0-0-251 ~]$ find /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store_sales/ -type f -size 0b -print
[ec2-user@ip-10-0-0-251 ~]$   
[ec2-user@ip-10-0-0-251 ~]$ ls -lrt /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store_sales/00000*
-rw-r--r-- 1 yarn yarn        7 Mar 22 23:56 /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store_sales/000003_mdfile
-rw-rw-rw- 1 yarn yarn 57186437 Mar 22 23:56 /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store_sales/000003