Open GoogleCodeExporter opened 8 years ago
Updating with answers known so far:
- Loading the exact same dataset on the same machine (sensorium-11) was
successful (months ago). What could have changed??
==> You weren't in a hurry then - it wouldn't have been as much of a problem
then. It decided to wait. :-) (No way to really know the answer.)
- Would loading the data from several machines help? (Currently splitting the
input file and sending it to several nodes)
==> Yes because the materialization is happening at the sender side. If you
split the data 10 ways instead of 1 way these files will be 1/10 as big per
node.
- Why are WAF files generated on pre-sorted data (from a single source) and not
simply loaded?
==> This is (as mentioned by Yingyi) to prevent possible deadlocks that could
happen otherwise due to flow control during the hash-partitioned merge process.
Materializing provides a "spring buffer" that allows the sender and receivers
to pace themselves as needed.
- Why is AsterixDB holding on to these files until you stop the instance and
only then releases the disk space?
==> As Young-Seok said, this shouldn't be happening - it's the failure and bad
cleanup that's causing this problem, apparently.
Original comment by dtab...@gmail.com
on 9 Jun 2015 at 9:45
Original issue reported on code.google.com by
ker...@gmail.com
on 9 Jun 2015 at 3:15