Closed paul-sud closed 5 years ago
This will be a RAM problem. A human BS index for GEM is about 32GB so you will need at least that (realistically you should have at least 48GB). With GEM we traded index size for speed: the larger index is what gives the speed advantage for GEM.
Simon
On Wed, 4 Sep 2019, 19:25 Paul Sud, notifications@github.com wrote:
Hello,
I am trying to create a gemBS index for GRCh38, specifically this fasta: https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15/ . However, it always fails at exactly 50%. I didn't have this issue indexing the test data from http://statgen.cnag.cat/GEMBS/UserGuide/_build/html/example.html . The configuration I used here is very similar. Unfortunately the traceback isn't very informative, the relevant logs are shown below.
I should add, I don't believe it to be a resource issue. I ran it with 8 cores, 12 GB RAM, and 200 GB disk space, and post mortem didn't reveal any concerning usage levels.
: : Command index started at 2019-08-26 21:01:56.662514 : 2019-08-26 21:01:56,689 DEBUG: Using bundled binary : /root/.local/lib/python3.5/site-packages/gemBS/gemBSbinaries/gemBS_cat 2019-08-26 21:01:56,690 DEBUG: Using binary from PATH: /usr/bin/pigz 2019-08-26 21:01:56,690 INFO: Starting: /root/.local/lib/python3.5/site-packages/gemBS/gemBSbinaries/gemBS_cat reference/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz reference/lambda.fa.gz | /usr/bin/pigz 2019-08-26 21:01:56,690 DEBUG: Starting subprocess 2019-08-26 21:01:56,693 DEBUG: File output detected, opening output stream to indexes/GRCh38_no_alt_analysis_set_GCA_000001405.15.BS_gemBS.tmp.gz 2019-08-26 21:01:56,693 DEBUG: Setting process input to parent output 2019-08-26 21:01:56,693 DEBUG: Starting subprocess 2019-08-26 21:03:48,488 DEBUG: Process '/usr/bin/pigz' finished with 0 2019-08-26 21:03:48,488 DEBUG: Process '/root/.local/lib/python3.5/site-packages/gemBS/gemBSbinaries/gemBS_cat' finished with 0 : Creating index 2019-08-26 21:03:48,489 DEBUG: Using bundled binary : /root/.local/lib/python3.5/site-packages/gemBS/gemBSbinaries/gem-indexer 2019-08-26 21:03:48,489 INFO: Starting: /root/.local/lib/python3.5/site-packages/gemBS/gemBSbinaries/gem-indexer -i indexes/GRCh38_no_alt_analysis_set_GCA_000001405.15.BS_gemBS.tmp.gz -o indexes/GRCh38_no_alt_analysis_set_GCA_000001405.15.BS -b --tmp-folder indexes/ -s 4 -t 8 2019-08-26 21:03:48,489 DEBUG: Setting process log file to indexes/gem_indexer_GRCh38_no_alt_analysis_set_GCA_000001405.15.BS.gem.err 2019-08-26 21:03:48,489 DEBUG: Starting subprocess 2019-08-26 21:36:56,369 DEBUG: Process '/root/.local/lib/python3.5/site-packages/gemBS/gemBSbinaries/gem-indexer' finished with -9 2019-08-26 21:36:56,731 ERROR: Process '/root/.local/lib/python3.5/site-packages/gemBS/gemBSbinaries/gem-indexer' finished with -9 2019-08-26 21:36:56,798 ERROR: 2019/8/26 21:03:48 -- [Inspecting MultiFASTA] 2019-08-26 21:36:56,798 ERROR: 2019/8/26 21:04:18 -- 100% ... done [29.907 s] 2019-08-26 21:36:56,798 ERROR: 2019/8/26 21:04:18 -- Inspected text 12399884961 characters (index_complement=yes). Requesting 11825 MB (encoded text) 2019-08-26 21:36:56,798 ERROR: 2019/8/26 21:04:18 -- [Reading MultiFASTA] 2019-08-26 21:36:56,798 ERROR: 2019/8/26 21:04:19 -- 100000000 bases parsed ----------------------------[omitted for brevity]--------------------------------- 2019-08-26 21:36:56,799 ERROR: 2019/8/26 21:05:06 -- 3100000000 bases parsed 2019-08-26 21:36:56,799 ERROR: 2019/8/26 21:05:06 -- Total 3144255742 bases parsed ...done [48.525 s] 2019-08-26 21:36:56,799 ERROR: 2019/8/26 21:05:06 -- [Generating Bisulfite-Text (C2T & G2A)] 2019-08-26 21:36:56,799 ERROR: 2019/8/26 21:05:17 -- 100% ... done [10.268 s] 2019-08-26 21:36:56,799 ERROR: 2019/8/26 21:05:17 -- [Generating Text (explicit Reverse-Complement)] 2019-08-26 21:36:56,799 ERROR: 2019/8/26 21:05:34 -- 100% ... done [17.645 s] 2019-08-26 21:36:56,799 ERROR: 2019/8/26 21:05:34 -- [Generating BWT Forward-Text] 2019-08-26 21:36:56,799 ERROR: 2019/8/26 21:05:34 -- [Building-BWT::Counting K-mers] 2019-08-26 21:36:56,799 ERROR: 2019/8/26 21:06:05 -- 100% ... done [30.960 s] 2019-08-26 21:36:56,799 ERROR: 2019/8/26 21:06:05 -- [Building-BWT::Generating SA-Positions] 2019-08-26 21:36:56,799 ERROR: 2019/8/26 21:06:06 -- 1% ----------------------------[omitted for brevity]--------------------------------- 2019-08-26 21:36:56,802 ERROR: 2019/8/26 21:18:41 -- 100% ... done [755.087 s] 2019-08-26 21:36:56,802 ERROR: 2019/8/26 21:18:41 -- [Building-BWT::Sorting SA] 2019-08-26 21:36:56,802 ERROR: 2019/8/26 21:20:42 -- 0% 2019-08-26 21:36:56,802 ERROR: 2019/8/26 21:21:12 -- 1% ----------------------------[omitted for brevity]--------------------------------- 2019-08-26 21:36:56,804 ERROR: 2019/8/26 21:36:50 -- 50% Traceback (most recent call last): File "/root/.local/bin/gemBS", line 11, in
load_entry_point('gemBS==3.2.2', 'console_scripts', 'gemBS')() File "/root/.local/lib/python3.5/site-packages/gemBS/commands.py", line 157, in gemBS_main instances[args.command].run(args) File "/root/.local/lib/python3.5/site-packages/gemBS/production.py", line 175, in run ret = index(fasta_input, index_name, extra_fasta_files=extra_fasta_files, threads=self.threads, sampling_rate=args.sampling_rate, tmpDir=os.path.dirname(index_name)) File "/root/.local/lib/python3.5/site-packages/gemBS/init.py", line 590, in index raise ValueError("Error while executing the Bisulphite gem-indexer") ValueError: Error while executing the Bisulphite gem-indexer The commands I used are the following:
gemBS --loglevel debug prepare -c /cromwell_root/wgbs-pipeline/gembs_test_data/ENCSR750CPN_test_run/ENCSR750CPN_gembs_fixed.conf \ -t /cromwell_root/wgbs-pipeline/gembs_test_data/ENCSR750CPN_test_run/ENCSR750CPN_metadata.csv \ -o gemBS.json \ --no-db gemBS --loglevel debug -j gemBS.json index
The .conf file for prepare looked like this, largely copied from the example:
reference = reference/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz extra_references = reference/lambda.fa.gz index_dir = indexes
base = . sequence_dir = ${base}/fastq/@SAMPLE bam_dir = ${base}/mapping/@BARCODE bcf_dir = ${base}/calls/@BARCODE extract_dir = ${base}/extract/@BARCODE report_dir = ${base}/report
threads = 8 jobs = 3
[mapping] underconversion_sequence = chrL
include IHEC_standard.conf
The resulting gemBS.json from prepare looked like this:
{ sampleData: { ENCSR750CPN: { sample_name: "ENCSR750CPN", sample_barcode: "sample_ENCSR750CPN" } }, contigs: { }, config: { mapping: { underconversion_sequence: "chrL", non_stranded: "False", remove_individual_bams: "True" }, DEFAULT: { bcf_dir: "./calls/@BARCODE", threads: "8", gembs_dbfile: "file:gemBS?mode=memory&cache=shared", bam_dir: "./mapping/@BARCODE", extra_references: "reference/lambda.fa.gz", extract_dir: "./extract/@BARCODE", reference: "reference/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz", base: ".", sequence_dir: "./fastq/@SAMPLE", nonbs_flag: false, index_dir: "indexes", jobs: "3", report_dir: "./report" }, index: { sampling_rate: "4" }, report: { }, calling: { keep_improper_pairs: "False", qual_threshold: "13", left_trim: "5", keep_duplicates: "False", remove_individual_bcfs: "True", haploid: "False", right_trim: "0", omit_contigs: [ "chrL" ], contig_pool_limit: "25000000", conversion: "auto", mapq_threshold: "10", reference_bias: "2" }, extract: { strand_specific: "True", make_bedmethyl: "True", make_cpg: "True", make_non_cpg: "True", make_bigwig: "True", phred_threshold: "10" } } }
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS/issues/61?email_source=notifications&email_token=AAY4654DD2VWHLT7BQQHPSDQH7VPVA5CNFSM4ITUQ3IKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HJJ72IQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAY4653KXBAJVTM5FTMERKTQH7VPVANCNFSM4ITUQ3IA .
Thanks for the quick response Simon. I'll try again with 48GB RAM
Indexing succeeded with the above amount of RAM. Thanks again!
Hello,
I am trying to create a gemBS index for GRCh38, specifically this fasta: https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15/ . However, it always fails at exactly 50%. I didn't have this issue indexing the test data from http://statgen.cnag.cat/GEMBS/UserGuide/_build/html/example.html . The configuration I used here is very similar. Unfortunately the traceback isn't very informative, the relevant logs are shown below.
I should add, I don't believe it to be a resource issue. I ran it with 8 cores, 12 GB RAM, and 200 GB disk space, and post mortem didn't reveal any concerning usage levels.
The commands I used are the following:
The .conf file for prepare looked like this, largely copied from the example:
The resulting gemBS.json from
prepare
looked like this: