iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

std::bad_alloc error on human PRG #85

Closed iqbal-lab closed 6 years ago

iqbal-lab commented 6 years ago

Building PRG for 1000 Genomes dataset (all variants)

2018-01-26 13:41:56,262 gramtools    INFO     Start process: build
2018-01-26 13:41:56,262 gramtools    DEBUG    Checking project file structure
2018-01-26 13:41:56,263 gramtools    DEBUG    Executing command:

perl /Net/fs1/home/zam/dev/git/gramtools_virtualenv/lib/python3.5/site-packages/gramtools/utils/vcf_to_linear_prg.pl --outfile ./gram-all/prg --vcf /data2/users/zam/analyses/2018/0122_test_gramtools_on_human/human_1000g_ALL.vcf --ref /data2/users/zam/analyses/2018/0122_test_gramtools_on_human/Homo_sapiens.GRCh37.60.dna.WHOLE_GENOME.fa

2018-01-26 14:53:50,304 gramtools    INFO     stdout:

2018-01-26 14:53:50,996 gramtools    DEBUG    Finished executing command: 4314.734 seconds
2018-01-26 14:53:50,997 gramtools    DEBUG    Executing command:

/Net/fs1/home/zam/dev/git/gramtools_virtualenv/lib/python3.5/site-packages/gramtools/bin/gram build --gram ./gram-all --kmer-size 15 --max-read-size 150 --debug

2018-01-26 14:53:50,997 gramtools    DEBUG    Using current working directory:
/data2/users/zam/analyses/2018/0122_test_gramtools_on_human
2018-01-26 14:53:51,009 gramtools    INFO     stdout:

2018-01-27 09:41:15,126 gramtools    INFO     Process termination message:
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

2018-01-27 09:41:15,127 gramtools    INFO     Process termination code: -6
2018-01-27 09:41:15,127 gramtools    ERROR    Error code != 0
2018-01-27 09:41:15,127 gramtools    DEBUG    Computing sha256 hash of project paths
2018-01-27 09:50:57,388 gramtools    DEBUG    Saving command report:
./gram-all/build_report.json
2018-01-27 09:50:57,401 gramtools    INFO     End process: build
Finished printing linear PRG. Final number in alphabet is  158548838

Executing build command
Generating integer encoded PRG
Number of charecters in integer encoded linear PRG: 3455590684
Generating FM-Index
Generating PRG masks
Generating kmer index
iqbal-lab commented 6 years ago

Here's the build report:

{
    "start_time": "1516974116",
    "end_time": "1517046657",
    "total_runtime": 72541,
    "prg_build_report": {
        "command": "perl /Net/fs1/home/zam/dev/git/gramtools_virtualenv/lib/python3.5/site-packages/gramtools/utils/vcf_to_linear_prg.pl --outfile ./gram-all/prg --vcf /data2/users/zam/
analyses/2018/0122_test_gramtools_on_human/human_1000g_ALL.vcf --ref /data2/users/zam/analyses/2018/0122_test_gramtools_on_human/Homo_sapiens.GRCh37.60.dna.WHOLE_GENOME.fa",
        "return_value_is_0": true,
        "stdout": [
            "Finished printing linear PRG. Final number in alphabet is  158548838"
        ]
    },
    "gramtools_cpp_build": {
        "command": "/Net/fs1/home/zam/dev/git/gramtools_virtualenv/lib/python3.5/site-packages/gramtools/bin/gram build --gram ./gram-all --kmer-size 15 --max-read-size 150 --debug",
        "return_value_is_0": false,
        "stdout": [
            "Executing build command",
            "Generating integer encoded PRG",
            "Number of charecters in integer encoded linear PRG: 3455590684",
            "Generating FM-Index",
            "Generating PRG masks",
            "Generating kmer index"
        ]
    },
    "current_working_directory": "/data2/users/zam/analyses/2018/0122_test_gramtools_on_human",
    "paths": {
        "encoded_prg": "./gram-all/encoded_prg",
        "kmer_index": "./gram-all/kmers/kmer_index_15",
        "reference": "/data2/users/zam/analyses/2018/0122_test_gramtools_on_human/Homo_sapiens.GRCh37.60.dna.WHOLE_GENOME.fa",
        "perl_generated_fa": "./gram-all/perl_generated_fa",
        "project": "./gram-all",
        "vcf": "/data2/users/zam/analyses/2018/0122_test_gramtools_on_human/human_1000g_ALL.vcf",
        "perl_generated_vcf": "./gram-all/perl_generated_vcf",
        "fm_index": "./gram-all/fm_index",
        "variant_site_mask": "./gram-all/variant_site_mask",
        "allele_mask": "./gram-all/allele_mask",
        "build_report": "./gram-all/build_report.json",
        "prg": "./gram-all/prg"
    },
    "path_hashes": {
        "prg": "469bdd5ca3a78f30113956045d6fdd081aed09769b69603e5e1e2c55fca7348d",
        "fm_index": "ffd2c9c7c4dd5ac8401fc12fefb8addee553e17d22d22712b38bb2e8c6883bde",
        "variant_site_mask": "d10899d3f84b2cd6ec9ce65c8ada59fffbb2da320fb96954bf99f26a82706034",
        "encoded_prg": "579a3cbd44f5c7bbce8ceeb0806707a7db03962322ccbd94c1d272ed8a090ae4",
        "allele_mask": "f9b4482efc31bf9979d835a4ed120fbb62893d4ccf40f9c8b294a396d42e548c",
        "reference": "37ec37a464033d63f3332b4f2ecc615055e0cf8b463e99c88ecfbfda3befe521",
        "perl_generated_fa": "1c49ba82697767847a3bbb19aec247f4d1ae77ebdeb9676437ab0f7984f54b0e",
        "vcf": "311ad432b2da5bdd719b536313fa1fc48833c8adf3efcc6a7ee8ed3fe019973b",
        "perl_generated_vcf": "13b4634203628ca979e9f2a4028fefbcfe93961e2a3014c81102a79d3c1d5a82"
    },
    "version_report": {
        "version_number": "0.5.0",
        "last_git_commit_hash": "7a53428da84d096805ee51679b60d376cc588cb2",
        "current_git_branch": "master",
        "truncated_git_commits": [
            "7a53428 - Robyn Ffrancon, 4 minutes ago : disable unit tests which depend on unstable ordered data structure",
            "31d8c13 - Robyn Ffrancon, 20 minutes ago : remove biopython dependancy",
            "be2838b - Robyn Ffrancon, 3 hours ago : updated install instructions given new python3 functionality",
            "addd790 - Robyn Ffrancon, 4 hours ago : added instructions for installing without root",
            "26094ea - Robyn Ffrancon, 21 hours ago : fix: gram executable called from gramtools python module with correct enviroment variables for finding libraries"
        ]
    }
}
iqbal-lab commented 6 years ago

For the record it worked with the reduced PRG (all variants above 5% frequency)

ffranr commented 6 years ago

Just to be clear, I recall you saying that it ran on a machine with 1Tb of RAM. Does the program have access to all of that RAM? Is it competing with other programs?

iqbal-lab commented 6 years ago

It was competing, but over 300Gb of RAM was free. Rachel was using 200-400Gb of RAM, Jerome was using 200Gb RAM

iqbal-lab commented 6 years ago

Also dmesg did not have any messages from the OS about oom-killer etc

ffranr commented 6 years ago

The way forward with this issue is to monitor and profile memory usage. std::bad_alloc error normally occurs when the program requests more memory than the system can afford to give.

iqbal-lab commented 6 years ago

yep. let's try again at EBI, on a non-shared machine as a start

ffranr commented 6 years ago

This error occurred because the process could not allocate additional memory. It occurred whilst generating the kmer index with a kmer size of 15. Since this issue was last updated, the default kmer size has been reduced to 5. Other memory optimizations have also been implemented.

I'm closing this issue for now because I don't believe this occurred as a result of a code bug.