hail-is / hail

Cloud-native genomic dataframes and batch computing
https://hail.is
MIT License
980 stars 246 forks source link

Java core dump error #4418

Closed gtiao closed 5 years ago

gtiao commented 6 years ago

To report a bug, fill in the information below. For support and feature requests, please use the discussion forum: http://discuss.hail.is/


Hail version: 0.2

What you did:

from gnomad_hail import *

logging.basicConfig(format="%(levelname)s (%(name)s %(lineno)s): %(message)s")
logger = logging.getLogger("variant_histograms")
logger.setLevel(logging.INFO)

def release_ht_path(data_type: str, nested = True, with_subsets = True, temp = False):
    tag = 'nested_release' if nested else 'flat_release'
    tag = tag + '.with_subsets' if with_subsets else tag + '.no_subsets'
    tag = tag + '.temp' if temp else tag
    return f'gs://gnomad/release/2.1/ht/gnomad.{data_type}.{tag}.ht'

def main(args):
    hl.init(log='/variant_histograms.log')
    data_type = 'genomes' if args.genomes else 'exomes'

    metrics = ['FS', 'InbreedingCoeff', 'MQ', 'MQRankSum', 'QD', 'ReadPosRankSum', 'SOR', 'BaseQRankSum',
               'ClippingRankSum', 'DP', 'VQSLOD', 'rf_tp_probability', 'pab_max']

    ht = hl.read_table(release_ht_path(data_type, nested=False))
    # NOTE: histogram aggregations are done on the entire callset (not just PASS variants), on raw data

    # Compute median and MAD on variant metrics
    medmad_dict = {}
    for metric in metrics:
        medmad_dict[metric] = hl.struct(median=hl.median(hl.agg.collect(ht[metric])), mad=4*1.48268*hl.median(hl.abs(hl.agg.collect(ht[metric])-hl.median(hl.agg.collect(ht[metric])))))
    medmad = ht.aggregate(hl.struct(**medmad_dict))
    print(medmad)
    print(hl.eval_expr(hl.json(medmad)))

What went wrong (all error messages here, including the full java stack trace):

[Stage 0:==================================================>(9853 + 93) / 10000]#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fbeaec3ca22, pid=6662, tid=0x00007fbe3dd81700
#
# JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build 1.8.0_181-8u181-b13-1~deb9u1-b13)
# Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J 14270 C1 is.hail.annotations.Region.storeInt(JI)V (6 bytes) @ 0x00007fbeaec3ca22 [0x00007fbeaec3c980+0xa2]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/828e66d5a71741d7ab2c8d6580997da3/hs_err_pid6662.log
Compiled method (c1)   88328 14270       3       is.hail.annotations.Region::storeInt (6 bytes)
 total in heap  [0x00007fbeaec3c810,0x00007fbeaec3cbc0] = 944
 relocation     [0x00007fbeaec3c938,0x00007fbeaec3c968] = 48
 main code      [0x00007fbeaec3c980,0x00007fbeaec3caa0] = 288
 stub code      [0x00007fbeaec3caa0,0x00007fbeaec3cb30] = 144
 oops           [0x00007fbeaec3cb30,0x00007fbeaec3cb38] = 8
 metadata       [0x00007fbeaec3cb38,0x00007fbeaec3cb48] = 16
 scopes data    [0x00007fbeaec3cb48,0x00007fbeaec3cb78] = 48
 scopes pcs     [0x00007fbeaec3cb78,0x00007fbeaec3cbb8] = 64
 dependencies   [0x00007fbeaec3cbb8,0x00007fbeaec3cbc0] = 8
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
FATAL: caught signal 6 SIGABRT
/tmp/libhail7224206977949339430.so(+0x1788c)[0x7fbdea5db88c]
/lib/x86_64-linux-gnu/libc.so.6(+0x33060)[0x7fbec2eae060]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcf)[0x7fbec2eadfff]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7fbec2eaf42a]
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so(+0x8c0259)[0x7fbec27f0259]
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so(+0xa744f8)[0x7fbec29a44f8]
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x265)[0x7fbec27f9e45]
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so(+0x8bd4c8)[0x7fbec27ed4c8]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7fbec38580c0]
[0x7fbeaec3ca22]
ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [828e66d5a71741d7ab2c8d6580997da3] entered state [ERROR] while waiting for [DONE].
Traceback (most recent call last):
  File "pyhail.py", line 132, in <module>
    main(args, pass_through_args)
  File "pyhail.py", line 113, in main
    subprocess.check_output(job)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['gcloud', 'dataproc', 'jobs', 'submit', 'pyspark', '/Users/gtiao/gnomad_qc/hail/variant_qc/make_var_annot_hists.py', '--cluster', 'gt3', '--files=gs://hail-common/builds/devel/jars/hail-devel-cadc5eefca6e-Spark-2.2.0.jar', '--py-files=gs://hail-common/builds/devel/python/hail-devel-cadc5eefca6e.zip,/var/folders/rn/t2xcx1ps4h96txll46qkkfsj2q8bnl/T/pyscripts_K7Vs59.zip', '--driver-log-levels', 'root=FATAL,is.hail=INFO', '--properties=spark.executor.extraClassPath=./hail-devel-cadc5eefca6e-Spark-2.2.0.jar,spark.driver.extraClassPath=./hail-devel-cadc5eefca6e-Spark-2.2.0.jar,spark.files=./hail-devel-cadc5eefca6e-Spark-2.2.0.jar,spark.submit.pyFiles=./gs://hail-common/builds/devel/python/hail-devel-cadc5eefca6e.zip', '--', '--overwrite', '--exomes', '--slack_channel', '@grace']' returned non-zero exit status 1
tpoterba commented 6 years ago

@chrisvittal randomly assigned you via the scorecard.

chrisvittal commented 5 years ago

@gtiao Have you run into issues like this recently?

gtiao commented 5 years ago

I haven't run very many Hail pipelines since ASHG, so there hasn't been much opportunity to see this bug. Sorry I can't help more!

chrisvittal commented 5 years ago

I'm closing this as the issue has probably been resolved, at the very least the underlying architecture has changed enough that I would expect any segfaults to be a new issue.