zavolanlab / Dockerfiles

Dockerfile repository of the Zavolan Lab
Apache License 2.0
9 stars 2 forks source link

Build docker image for omniclip #5

Closed fgypas closed 4 years ago

fgypas commented 6 years ago

Build docker image for https://github.com/philippdre/omniCLIP

fgypas commented 5 years ago

Testing here: https://cloud.docker.com/u/zavolab/repository/docker/zavolab/omniclip

mkatsanto commented 5 years ago

Getting the following error:

File "/opt/omniCLIP/data_parsing/tools.py", line 1103, in subsample_suff_stat new_counts = np.random.multinomial(min(subsample_size, np.sum(NrOfCounts[key][0,:])), NrOfCounts[key][0,:]/np.float64(np.sum(NrOfCounts[key][0,:])), size=1) IndexError: too many indices for array

fgypas commented 5 years ago

When I run within the docker container I get the following error:

`root@521749816d02:/# python /opt/omniCLIP/omniCLIP.py --annot example_data/gencode.v19.annotation.chr1.gtf.db --genome-dir example_data/hg37/ --clip-files example_data/PUM2_rep1_chr1.bam --clip-files example_data/PUM2_rep2_chr1.bam --bg-files example_data/RZ_rep1_chr1.bam --bg-files example_data/RZ_rep2_chr1.bam --out-dir example_data --collapsed-CLIP --bck-var .gitignore/ LICENSE README.md data_parsing/ omniCLIP.py stat/ visualisation/ root@521749816d02:/# python /opt/omniCLIP/omniCLIP.py --annot example_data/gencode.v19.annotation.chr1.gtf.db --genome-dir example_data/hg37/ --clip-files example_data/PUM2_rep1_chr1 .bam --clip-files example_data/PUM2_rep2_chr1.bam --bg-files example_data/RZ_rep1_chr1.bam --bg-files example_data/RZ_rep2_chr1.bam --out-dir example_data --collapsed-CLIP --bck-var Namespace(bg_collapsed=False, bg_libs=['example_data/RZ_rep1_chr1.bam', 'example_data/RZ_rep2_chr1.bam'], bg_type='Coverage_bck', diag_bg=False, diag_event_mod='DirchMultK', emp_var=False, fg_collapsed=True, fg_libs=['example_data/PUM2_rep1_chr1.bam', 'example_data/PUM2_rep2_chr1.bam'], fg_pen=0.0, filter_snps=False, gene_anno_file='example_data/gencode.v19.annotation.chr1.gtf.db', gene_sample=100000, genome_dir='example_data/hg37/', glm_weight=-1.0, ign_GLM=False, ign_diag=False, ign_out_rds=False, mask_flank_variants=3, mask_miRNA=False, mask_ovrlp=True, max_it=20, max_it_glm=10, max_mm=2, nb_proc=1, norm_class=False, nr_mix_comp=1, only_coverage=False, only_pred=False, out_dir='example_data', overwrite_bg=True, overwrite_fg=True, pred_sites=False, pseudo_count=None, pv_cutoff=0.05, restart_from_file=False, rev_strand=None, rnd_seed=None, safe_tmp=False, skip_diag_event_mdl=False, snps_min_cov=10, snps_thresh=0.2, subs=True, thresh=None, tmp_dir=None, tol_lg_lik=10000.0, tr_type='binary', use_precomp_diagmod=None, verbosity=0) Loading gene annotation Memory usage: 109120 (kb) Loading reads Parsing the gene annotation Processing chr1

Saving results Parsing the gene annotation Processing chr1

Saving results Masking overlapping positions /usr/local/lib/python2.7/dist-packages/h5py/_hl/dataset.py:313: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead. "Use dataset[()] instead.", H5pyDeprecationWarning) /opt/omniCLIP/data_parsing/tools.py:858: RuntimeWarning: invalid value encountered in double_scalars med = ((temp_med_floor tot_floor) + (temp_med_ceil tot_ceil)) / (tot_floor + tot_ceil) Removing genes without CLIP coverage Done: Elapsed time: 695.783895016 Memory usage: 1165272 (kb) Initialising the parameters Iteration: 0 Computing most likely path Killed`

fgypas commented 5 years ago

Current error using singularity

/usr/local/lib/python2.7/dist-packages/h5py/_hl/dataset.py:313: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead. "Use dataset[()] instead.", H5pyDeprecationWarning) /opt/omniCLIP/data_parsing/tools.py:858: RuntimeWarning: invalid value encountered in double_scalars med = ((temp_med_floor tot_floor) + (temp_med_ceil tot_ceil)) / (tot_floor + tot_ceil) /usr/local/lib/python2.7/dist-packages/scipy/sparse/linalg/dsolve/linsolve.py:253: SparseEfficiencyWarning: splu requires CSC matrix format warn('splu requires CSC matrix format', SparseEfficiencyWarning) /usr/local/lib/python2.7/dist-packages/scipy/optimize/_minimize.py:600: RuntimeWarning: Method 'bounded' does not support relative tolerance in x; defaulting to absolute tolerance. "defaulting to absolute tolerance.", RuntimeWarning) Namespace(bg_collapsed=False, bg_libs=['example_data/RZ_rep1_chr1.bam', 'example_data/RZ_rep2_chr1.bam'], bg_type='Coverage_bck', diag_bg=False, diag_event_mod='DirchMultK', emp_var=False, fg_collapsed=True, fg_libs=['example_data/PUM2_rep1_chr1.bam', 'example_data/PUM2_rep2_chr1.bam'], fg_pen=0.0, filter_snps=False, gene_anno_file='example_data/gencode.v19.annotation.chr1.gtf.db', gene_sample=100000, genome_dir='example_data/hg37/', glm_weight=-1.0, ign_GLM=False, ign_diag=False, ign_out_rds=False, mask_flank_variants=3, mask_miRNA=False, mask_ovrlp=True, max_it=20, max_it_glm=10, max_mm=2, nb_proc=1, norm_class=False, nr_mix_comp=1, only_coverage=False, only_pred=False, out_dir='example_data', overwrite_bg=True, overwrite_fg=True, pred_sites=False, pseudo_count=None, pv_cutoff=0.05, restart_from_file=False, rev_strand=None, rnd_seed=None, safe_tmp=False, skip_diag_event_mdl=False, snps_min_cov=10, snps_thresh=0.2, subs=True, thresh=None, tmp_dir=None, tol_lg_lik=10000.0, tr_type='binary', use_precomp_diagmod=None, verbosity=0) Loading gene annotation Memory usage: 93564 (kb) Loading reads Parsing the gene annotation Processing chr1 Saving results Parsing the gene annotation Processing chr1 Saving results Masking overlapping positions Removing genes without CLIP coverage Done: Elapsed time: 1127.059973 Memory usage: 1154820 (kb) Initialising the parameters Iteration: 0 Computing most likely path

Done: Elapsed time: 186.006131887 Memory usage: 3340920 (kb) Fitting emission parameters Memory usage: 3340920 (kb) Fitting emission parameters Estimating expression parameters Memory usage: 3340920 (kb) Start estimation of expression parameters Constructing GLM matrix Estimating expression parameters: before GLMMatrix Memory usage: 3340920 (kb) Estimating expression parameters: after GLMMatrix Memory usage: 3340920 (kb) Done: Elapsed time: 16.7295198441 Estimating expression parameters: before GLMMatrix Memory usage: 3340920 (kb) Fitting GLM Estimating expression parameters: before fitting Memory usage: 3340920 (kb) [[-5.86262797] [-1.39794241] [-6.40704993]] Dispersion 4.73991326495 1323538.14147 [[-4.99527274] [-1.55260304] [-5.91156312]] Dispersion 4.75639033771 2897.6338165 [[-4.99320863] [-1.55298395] [-5.91074454]] Dispersion 4.75646887385 13.784992224 Estimating expression parameters: afer fitting Memory usage: 3340920 (kb) Estimating expression parameters: afer cleanup Memory usage: 3340920 (kb) Done: Elapsed time: 35.1242051125 Finishes expression parameter estimation Memory usage: 3340920 (kb) computing sufficient statitics for fitting md Memory usage: 3340920 (kb) Getting suffcient statistic Done: Elapsed time: 151.128564119 Memory usage: 3479096 (kb) fitting md distribution Memory usage: 3479096 (kb) Estimating state 0 Estimating state 1 Estimating state 2 Estimating state 3 Memory usage: 3479096 (kb) Done: Elapsed time: 209.871788979 Memory usage: 3479096 (kb) Fitting transistion parameters Memory usage: 3479096 (kb) Fitting transistion parameters Memory usage: 3479096 (kb) Learning transistion model Iterating over genes Fitting transistion parameters: I Memory usage: 3479096 (kb) .Fitting transistion parameters: II Memory usage: 3479096 (kb) Fitting transistion parameters: III Memory usage: 4153612 (kb) Fitting transistion parameters: IV Memory usage: 4733776 (kb) Done: Elapsed time: 189.882477045 Fitting transistion parameters: V Memory usage: 4733776 (kb) Memory usage: 4733776 (kb) Memory usage: 4733776 (kb) Computing most likely path Memory usage: 4733776 (kb) Computing most likely path

Done: Elapsed time: 340.688632011 Memory usage: 4733776 (kb) LogLik: -276419958.813 Log-likelihood: -276419958.813 [-276419958.81327856] Iteration: 1 Fitting emission parameters Memory usage: 4733776 (kb) Fitting emission parameters Estimating expression parameters Memory usage: 4733776 (kb) Start estimation of expression parameters Constructing GLM matrix Estimating expression parameters: before GLMMatrix Memory usage: 4733776 (kb) Estimating expression parameters: after GLMMatrix Memory usage: 4733776 (kb) Done: Elapsed time: 18.5351040363 Estimating expression parameters: before GLMMatrix Memory usage: 4733776 (kb) Fitting GLM Estimating expression parameters: before fitting Memory usage: 4733776 (kb) [[ -5.60605638] [ -1.35371957] [-10.97159246]] Dispersion 1.53997308875 3284102.57242 [[ -5.60416194] [ -1.50858191] [-10.96700079]] Dispersion 1.43349552661 249546.407159 [[ -5.60728083] [ -1.52066257] [-10.96685927]] Dispersion 1.42854404723 12200.5147202 Estimating expression parameters: afer fitting Memory usage: 4733776 (kb) Estimating expression parameters: afer cleanup Memory usage: 4733776 (kb) Done: Elapsed time: 52.1908521652 Finishes expression parameter estimation Memory usage: 4733776 (kb) computing sufficient statitics for fitting md Memory usage: 4733776 (kb) Getting suffcient statistic Done: Elapsed time: 147.203412056 Memory usage: 4733776 (kb) fitting md distribution Memory usage: 4733776 (kb) Estimating state 0 Estimating state 1 Estimating state 2 Estimating state 3 Memory usage: 4733776 (kb) Done: Elapsed time: 223.586141825 Memory usage: 4733776 (kb) Fitting transistion parameters Memory usage: 4733776 (kb) Fitting transistion parameters Memory usage: 4733776 (kb) Learning transistion model Iterating over genes Fitting transistion parameters: I Memory usage: 4733776 (kb) .Fitting transistion parameters: II Memory usage: 4733776 (kb) Fitting transistion parameters: III Memory usage: 4733776 (kb) Fitting transistion parameters: IV Memory usage: 5037056 (kb) Done: Elapsed time: 193.760185003 Fitting transistion parameters: V Memory usage: 5037056 (kb) Memory usage: 5037056 (kb) Memory usage: 5037056 (kb) Computing most likely path Memory usage: 5037056 (kb) Computing most likely path

Done: Elapsed time: 322.033174038 Memory usage: 5037056 (kb) LogLik: -197388911.42 Log-likelihood: -197388911.42 [-276419958.81327856, -197388911.41977647] Iteration: 2 Fitting emission parameters Memory usage: 5037056 (kb) Fitting emission parameters Estimating expression parameters Memory usage: 5037056 (kb) Start estimation of expression parameters Constructing GLM matrix Estimating expression parameters: before GLMMatrix Memory usage: 5037056 (kb) Estimating expression parameters: after GLMMatrix Memory usage: 5037056 (kb) Done: Elapsed time: 18.5401170254 Estimating expression parameters: before GLMMatrix Memory usage: 5037056 (kb) Fitting GLM Estimating expression parameters: before fitting Memory usage: 5037056 (kb) [[ -5.82820056] [ -1.59726161] [-10.73771781]] Dispersion 1.18274382013 580832.293091 [[ -5.89526679] [ -1.62644377] [-10.736914 ]] Dispersion 1.17150958289 30115.415234 [[ -5.89871413] [ -1.62797896] [-10.73687714]] Dispersion 1.17097272347 1448.1699915 Estimating expression parameters: afer fitting Memory usage: 5037056 (kb) Estimating expression parameters: afer cleanup Memory usage: 5037056 (kb) Done: Elapsed time: 41.142747879 Finishes expression parameter estimation Memory usage: 5037056 (kb) computing sufficient statitics for fitting md Memory usage: 5037056 (kb) Getting suffcient statistic Done: Elapsed time: 114.98391819 Memory usage: 5037056 (kb) fitting md distribution Memory usage: 5037056 (kb) Estimating state 0 Estimating state 1 Estimating state 2 Estimating state 3 Memory usage: 5037056 (kb) Done: Elapsed time: 180.077622175 Memory usage: 5037056 (kb) Fitting transistion parameters Memory usage: 5037056 (kb) Fitting transistion parameters Memory usage: 5037056 (kb) Learning transistion model Iterating over genes Fitting transistion parameters: I Memory usage: 5037056 (kb) .Fitting transistion parameters: II Memory usage: 5037056 (kb) Fitting transistion parameters: III Memory usage: 5037056 (kb) Fitting transistion parameters: IV Memory usage: 5707136 (kb) Done: Elapsed time: 174.015991926 Fitting transistion parameters: V Memory usage: 5707136 (kb) Memory usage: 5707136 (kb) Memory usage: 5707136 (kb) Computing most likely path Memory usage: 5707136 (kb) Computing most likely path

Done: Elapsed time: 318.18231988 Memory usage: 5707136 (kb) LogLik: -196902982.375 Log-likelihood: -196902982.375 [-276419958.81327856, -197388911.41977647, -196902982.37539276] Iteration: 3 Fitting emission parameters Memory usage: 5707136 (kb) Fitting emission parameters Estimating expression parameters Memory usage: 5707136 (kb) Start estimation of expression parameters Constructing GLM matrix Estimating expression parameters: before GLMMatrix Memory usage: 5707136 (kb) Estimating expression parameters: after GLMMatrix Memory usage: 5707136 (kb) Done: Elapsed time: 18.618844986 Estimating expression parameters: before GLMMatrix Memory usage: 5707136 (kb) Fitting GLM Estimating expression parameters: before fitting Memory usage: 5707136 (kb) [[ -5.90353974] [ -1.77144729] [-10.59990291]] Dispersion 1.15870280174 33722.280353 [[ -5.9076541 ] [ -1.77311538] [-10.59985689]] Dispersion 1.15824039001 1279.57551063 [[ -5.90781012] [ -1.77317875] [-10.59985515]] Dispersion 1.15822292273 48.3475619276 Estimating expression parameters: afer fitting Memory usage: 5707136 (kb) Estimating expression parameters: afer cleanup Memory usage: 5707136 (kb) Done: Elapsed time: 29.4881711006 Finishes expression parameter estimation Memory usage: 5707136 (kb) computing sufficient statitics for fitting md Memory usage: 5707136 (kb) Getting suffcient statistic Done: Elapsed time: 114.442127228 Memory usage: 5707136 (kb) fitting md distribution Memory usage: 5707136 (kb) Estimating state 0 Estimating state 1 Estimating state 2 Estimating state 3 Memory usage: 5707136 (kb) Done: Elapsed time: 168.16003108 Memory usage: 5707136 (kb) Fitting transistion parameters Memory usage: 5707136 (kb) Fitting transistion parameters Memory usage: 5707136 (kb) Learning transistion model Iterating over genes Fitting transistion parameters: I Memory usage: 5707136 (kb) .Fitting transistion parameters: II Memory usage: 5707136 (kb) Fitting transistion parameters: III Memory usage: 5707136 (kb) Fitting transistion parameters: IV Memory usage: 5707136 (kb) Done: Elapsed time: 156.465799809 Fitting transistion parameters: V Memory usage: 5707136 (kb) Memory usage: 5707136 (kb) Memory usage: 5707136 (kb) Computing most likely path Memory usage: 5707136 (kb) Computing most likely path

Done: Elapsed time: 315.280578136 Memory usage: 5707136 (kb) LogLik: -196100507.253 Log-likelihood: -196100507.253 [-276419958.81327856, -197388911.41977647, -196902982.37539276, -196100507.25284377] Iteration: 4 Fitting emission parameters Memory usage: 5707136 (kb) Fitting emission parameters Estimating expression parameters Memory usage: 5707136 (kb) Start estimation of expression parameters Constructing GLM matrix Estimating expression parameters: before GLMMatrix Memory usage: 5707136 (kb) Estimating expression parameters: after GLMMatrix Memory usage: 5707136 (kb) Done: Elapsed time: 18.4652509689 Estimating expression parameters: before GLMMatrix Memory usage: 5707136 (kb) Fitting GLM Estimating expression parameters: before fitting Memory usage: 5707136 (kb) [[ -6.19819175] [ -2.05475657] [-10.53295815]] Dispersion 1.11540622662 125695.069891 [[ -6.21999346] [ -2.06097515] [-10.53278674]] Dispersion 1.11412036155 3865.32564373 [[ -6.22065613] [ -2.06116729] [-10.53278158]] Dispersion 1.11408137009 117.293383519 Estimating expression parameters: afer fitting Memory usage: 5707136 (kb) Estimating expression parameters: afer cleanup Memory usage: 5707136 (kb) Done: Elapsed time: 30.2455918789 Finishes expression parameter estimation Memory usage: 5707136 (kb) computing sufficient statitics for fitting md Memory usage: 5707136 (kb) Getting suffcient statistic Done: Elapsed time: 116.862193823 Memory usage: 5707136 (kb) fitting md distribution Memory usage: 5707136 (kb) Estimating state 0 Estimating state 1 Estimating state 2 Estimating state 3 Traceback (most recent call last): File "/opt/omniCLIP/omniCLIP.py", line 930, in run_omniCLIP(args) File "/opt/omniCLIP/omniCLIP.py", line 324, in run_omniCLIP CurrLogLikelihood, IterParameters, First, Paths = PerformIteration(Sequences, Background, IterParameters, NrOfStates, First, Paths) File "/opt/omniCLIP/omniCLIP.py", line 608, in PerformIteration NewEmissionParameters = FitEmissionParameters(Sequences, Background, NewPaths, EmissionParameters, First) File "/opt/omniCLIP/omniCLIP.py", line 733, in FitEmissionParameters NewEmissionParameters = mixture_tools.em(Counts, NrOfCounts, NewEmissionParameters, x_0=OldAlpha, First=First) File "/opt/omniCLIP/stat/mixture_tools.py", line 66, in em alpha, mixtures = Parallel_estimate_mixture_params(OldEmissionParameters, curr_counts, curr_nr_of_counts, curr_state, rand_sample_size, max_nr_iter, nr_of_iter=20, stop_crit=1.0, nr_of_init=10) File "/opt/omniCLIP/stat/mixture_tools.py", line 272, in Parallel_estimate_mixture_params scored_counts = score_counts(curr_counts, curr_state, EmissionParameters) File "/opt/omniCLIP/stat/mixture_tools.py", line 428, in score_counts scored_counts[mix_comp, :] = diag_event_model.pred_log_lik(counts, state, EmissionParameters, single_mix=mix_comp) File "/opt/omniCLIP/stat/diag_event_model.py", line 76, in pred_log_lik Prob = FitBinoDirchEmmisionProbabilities.ComputeStateProbForGeneMD_unif_rep(counts, alpha[:, single_mix], state, EmissionParameters) File "/opt/omniCLIP/stat/FitBinoDirchEmmisionProbabilities.py", line 162, in ComputeStateProbForGeneMD_unif_rep Prob[IxZeros] = np.tile(RatioLikelihood[0, 0] , (1, np.sum(IxZeros))) TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 2 dimensions nohup.out (END)

fgypas commented 4 years ago

closed by #52