Closed abhishekdas99 closed 6 years ago
After the fix:
[ec2-user@ip-10-0-0-251 ~]$ find /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store_sales/ -type f -size 0b -print
[ec2-user@ip-10-0-0-251 ~]$
[ec2-user@ip-10-0-0-251 ~]$ ls -lrt /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store_sales/00000*
-rw-r--r-- 1 yarn yarn 7 Mar 22 23:56 /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store_sales/000003_mdfile
-rw-rw-rw- 1 yarn yarn 57186437 Mar 22 23:56 /media/ephemeral0/fcache/bigdata-sets/perf/data/tpcds/parquet/scale_1000/store_sales/000003
Let say we have 3 nodes in the cluster and we have 12 files (File1 to File12). Lets assume consistent hashing algorithm puts File1, File4.. File10 in Node 1, File2, File5.. File11 in Node 2, File3, File6.. File12 in Node 3
So we should only see File1, File4.. File10 and the corresponding mdfiles in Node1.
But we are seeing File2, File3 etc files (which are not supposed be in Node1) are getting created with 0 bytes.
Same thing is happening for mdfiles as well.
From master log, we know out of 2 nodes, 10.0.0.240 is assigned to have store/000000 file
The file and corresponding md file is present in 10.0.0.240 node
But if we look at the other node, we also see the same file although the size is zero