prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.02k stars 5.36k forks source link

Presto server crashing causes master node to restart #9404

Closed geetikagupta16 closed 4 years ago

geetikagupta16 commented 6 years ago

I am working on a 3 node presto cluster and trying to run tpch queries on 5GB data on hive-orc. Whenever I execute a query, firstly it tries to execute the query but after a few seconds it crashes and my master node restarts. I could not figure out the problem from the logs, as there are no ERROR logs on my master or slaves nodes.

So what could be the reason for the presto-server crash?

SangeetaGulia commented 6 years ago

I am also facing the same issue for 100GB data. I have a cluster of 3 nodes with 45 GB of RAM each. Can anyone please look at this issue. I am using presto 0.187.

I am trying to load data to orc as below: create table part with (format='orc') as select * from tpch.sf100.part;

geetikagupta16 commented 6 years ago

@kokosing @electrum Can you please help us to resolve this issue.

kokosing commented 6 years ago

Maybe Presto process was killed by operating system. Try to rundmesg to see if there was any process killed.

SangeetaGulia commented 6 years ago
[    4.181028] xor: using function: prefetch64-sse (13740.000 MB/sec)
[    4.182272] async_tx: api initialized (async)
[    4.211202] Btrfs loaded, crc32c=crc32c-intel
[    4.937520] EXT4-fs (md2): INFO: recovery required on readonly filesystem
[    4.937574] EXT4-fs (md2): write access will be enabled during recovery
[    4.972629] random: crng init done
[    6.212026] EXT4-fs (md2): orphan cleanup on readonly fs
[    6.212150] EXT4-fs (md2): 4 orphan inodes deleted
[    6.212199] EXT4-fs (md2): recovery complete
[    6.447291] EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: (null)
[    7.821225] systemd[1]: systemd 229 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN)
[    7.821470] systemd[1]: Detected architecture x86-64.
[    7.848819] systemd[1]: Set hostname to <hadoop-master>.
[    8.463412] systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point.
[    8.463736] systemd[1]: Reached target Remote File Systems (Pre).
[    8.464013] systemd[1]: Listening on /dev/initctl Compatibility Named Pipe.
[    8.464296] systemd[1]: Listening on LVM2 poll daemon socket.
[    8.464575] systemd[1]: Listening on Syslog Socket.
[    8.464855] systemd[1]: Listening on udev Control Socket.
[    8.465146] systemd[1]: Listening on Journal Socket (/dev/log).
[    8.659611] EXT4-fs (md2): re-mounted. Opts: (null)
[   11.543185] systemd-journald[390]: Received request to flush runtime journal from PID 1
[   11.818686] i5500_temp 0000:00:14.3: Sensor seems to be disabled
[   11.818789] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[   11.820506] ACPI Warning: SystemIO range 0x0000000000000828-0x000000000000082F conflicts with OpRegion 0x0000000000000800-0x000000000000084F (\PMRG) (20160930/utaddress-247)
[   11.820510] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[   11.820530] lpc_ich: Resource conflict(s) found affecting gpio_ich
[   11.821507] EDAC MC: Ver: 3.0.0
[   11.824985] EDAC MC0: Giving out device to module i7core_edac.c controller i7 core #0: DEV 0000:ff:03.0 (INTERRUPT)
[   11.825092] EDAC PCI0: Giving out device to module i7core_edac controller EDAC PCI controller: DEV 0000:ff:03.0 (POLLED)
[   11.825097] EDAC i7core: Driver loaded, 1 memory controller(s) found.
[   11.938762] gpio_ich: GPIO from 451 to 511 on gpio_ich
[   12.014543] [drm] VGACON disable radeon kernel modesetting.
[   12.014565] [drm:radeon_init [radeon]] *ERROR* No UMS support in radeon module!
[   12.179373] snd_hda_intel 0000:02:00.1: Handle vga_switcheroo audio client
[   12.187829] kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround
[   12.191403] input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.0/0000:02:00.1/sound/card0/input3
[   12.930748] EXT4-fs (md1): mounting ext3 file system using the ext4 subsystem
[   12.957590] Adding 25149436k swap on /dev/md0.  Priority:-1 extents:1 across:25149436k FS
[   13.018808] md: md2: resync done.
[   13.150394] EXT4-fs (md1): recovery complete
[   13.161079] EXT4-fs (md1): mounted filesystem with ordered data mode. Opts: (null)
[   13.230325] r8169 0000:06:00.0 eth0: link down
[   13.230333] r8169 0000:06:00.0 eth0: link down
[   13.230385] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   15.830322] r8169 0000:06:00.0 eth0: link up
[   15.830334] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

@kokosing Above is the output of dmesg. Also master node reboots and presto master is killed only when i am firing query with heavy load(100GB) on 50 GB it ran successfully.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.