Closed connorrogerson closed 3 years ago
hi @connorrogerson, I see that you have used the -s 0
flag, which means that the original size of regions is used. Could it be that some regions in the BED file have size 0? For instance, chr10:117988166-117988166?
T
Hi @simonvh my original bed file doesn't have any 0 size intervals. Maybe -s 0 is resizing them to 0?
If I remove the -s flag, the command line: gimme motifs -b /rds/user/cjr78/hpc-work/ATAC/macs2/merged_all/merged_final_fixed_peaks.fa -g mm10 --denovo Alluvial_open_activated_GM.5.0.Forkhead.0008.bed Alluvial_open_activated_GM.5.0.Forkhead.0008_gimmemotifs
This runs fine until I get another error:
2021-01-14 19:37:52,155 - INFO - starting full motif analysis
2021-01-14 19:37:52,155 - INFO - using size of 200, set size to 0 to use original region size
2021-01-14 19:37:52,156 - INFO - preparing input from BED
2021-01-14 19:37:59,742 - INFO - Copying custom background file /rds/user/cjr78/hpc-work/ATAC/macs2/merged_all/merged_final_fixed_peaks.fa to Alluvial_open_activated_GM.5.0.Forkhead.0008_gimmemotifs/intermediate/prediction.bg.fa.
2021-01-14 19:38:01,789 - WARNING - The custom background file /rds/user/cjr78/hpc-work/ATAC/macs2/merged_all/merged_final_fixed_peaks.fa contains sequences with a median size of 277.0, while GimmeMotifs predicts motifs in sequences of size 200. This will influence the statistics! It is recommended to use background sequences of the same size.
2021-01-14 19:38:02,103 - INFO - Copying custom background file /rds/user/cjr78/hpc-work/ATAC/macs2/merged_all/merged_final_fixed_peaks.fa to Alluvial_open_activated_GM.5.0.Forkhead.0008_gimmemotifs/intermediate/bg.custom.fa.
2021-01-14 19:38:03,962 - WARNING - The custom background file /rds/user/cjr78/hpc-work/ATAC/macs2/merged_all/merged_final_fixed_peaks.fa contains sequences with a median size of 277.0, while GimmeMotifs predicts motifs in sequences of size 200. This will influence the statistics! It is recommended to use background sequences of the same size.
2021-01-14 19:38:04,244 - INFO - starting motif prediction (xl)
2021-01-14 19:38:04,244 - INFO - tools: MEME, BioProspector, Homer
2021-01-14 19:38:06,178 - INFO - all jobs submitted
2021-01-14 19:38:10,764 - INFO - BioProspector_width_6 finished, found 5 motifs
2021-01-14 19:38:11,122 - INFO - BioProspector_width_8 finished, found 5 motifs
2021-01-14 19:38:11,429 - INFO - BioProspector_width_10 finished, found 5 motifs
2021-01-14 19:38:11,833 - INFO - BioProspector_width_12 finished, found 5 motifs
2021-01-14 19:38:12,003 - INFO - BioProspector_width_14 finished, found 5 motifs
2021-01-14 19:38:12,246 - INFO - BioProspector_width_16 finished, found 5 motifs
2021-01-14 19:38:12,517 - INFO - BioProspector_width_18 finished, found 5 motifs
2021-01-14 19:38:12,755 - INFO - BioProspector_width_20 finished, found 5 motifs
2021-01-14 19:38:52,766 - INFO - MEME_width_12 finished, found 10 motifs
2021-01-14 19:38:53,987 - INFO - MEME_width_10 finished, found 10 motifs
2021-01-14 19:38:55,282 - INFO - MEME_width_8 finished, found 10 motifs
2021-01-14 19:38:57,992 - INFO - MEME_width_6 finished, found 10 motifs
2021-01-14 19:39:14,379 - INFO - Homer_width_6 finished, found 5 motifs
2021-01-14 19:39:31,900 - INFO - MEME_width_14 finished, found 10 motifs
2021-01-14 19:39:33,734 - INFO - Homer_width_8 finished, found 5 motifs
2021-01-14 19:39:34,656 - INFO - MEME_width_18 finished, found 10 motifs
2021-01-14 19:39:37,034 - INFO - MEME_width_20 finished, found 10 motifs
2021-01-14 19:39:39,040 - INFO - MEME_width_16 finished, found 10 motifs
2021-01-14 19:40:11,144 - INFO - Homer_width_10 finished, found 5 motifs
2021-01-14 19:41:26,646 - INFO - Homer_width_12 finished, found 5 motifs
2021-01-14 19:53:14,696 - INFO - Homer_width_14 finished, found 5 motifs
2021-01-14 20:05:10,847 - INFO - Homer_width_16 finished, found 5 motifs
2021-01-14 20:25:24,517 - INFO - Homer_width_18 finished, found 5 motifs
2021-01-14 20:45:02,464 - INFO - Homer_width_20 finished, found 5 motifs
2021-01-14 20:46:47,511 - INFO - predicted 160 motifs
2021-01-14 20:46:47,592 - INFO - 43 motifs are significant
2021-01-14 20:46:47,905 - INFO - clustering 43 motifs.
2021-01-14 20:47:40,988 - INFO - creating de novo reports
2021-01-14 20:48:18,105 - INFO - finished
2021-01-14 20:48:18,106 - INFO - output dir: Alluvial_open_activated_GM.5.0.Forkhead.0008_gimmemotifs
2021-01-14 20:48:18,106 - INFO - de novo report: Alluvial_open_activated_GM.5.0.Forkhead.0008_gimmemotifs/gimme.denovo.html
2021-01-14 20:49:06,833 - INFO - creating motif scan tables
2021-01-14 20:49:28,531 - INFO - calculating stats
2021-01-14 20:49:29,806 - INFO - selecting non-redundant motifs
Traceback (most recent call last):
File "/home/cjr78/miniconda3/envs/gimme/bin/gimme", line 11, in
So updating gimmemotifs with pip seems to have sorted the issues. FYI, trying to install or update with conda seems to be taking a while i.e. solving environment issues
Describe the bug When running gimme motifs with the following parameters: gimme motifs -s 0 -f 0.5 -g mm10 --denovo Alluvial_open_GM.5.0.Forkhead.0008.bed Alluvial_open_GM.5.0.Forkhead.0008_gimmemotifs
We get the following error:
2021-01-13 17:29:57,285 - INFO - creating background (matched GC%) Sequences do not seem to be of equal size. GC% matched sequences of the median size (300) will be created 2021-01-13 17:30:15,741 - INFO - starting full motif analysis 2021-01-13 17:30:15,741 - INFO - using original size 2021-01-13 17:30:15,741 - INFO - preparing input from BED Please provide input file in BED or FASTA format Traceback (most recent call last): File "/home/cjr78/miniconda3/envs/gimme/lib/python3.7/site-packages/gimmemotifs/background.py", line 468, in matched_gc_bedfile for seq in fa.seqs File "/home/cjr78/miniconda3/envs/gimme/lib/python3.7/site-packages/gimmemotifs/background.py", line 468, in
for seq in fa.seqs
ZeroDivisionError: division by zero
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/cjr78/miniconda3/envs/gimme/bin/gimme", line 11, in
cli(sys.argv[1:])
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.7/site-packages/gimmemotifs/cli.py", line 625, in cli
args.func(args)
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.7/site-packages/gimmemotifs/commands/motifs.py", line 94, in motifs
"size": args.size,
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.7/site-packages/gimmemotifs/denovo.py", line 609, in gimme_motifs
params.get("custom_background", None),
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.7/site-packages/gimmemotifs/denovo.py", line 316, in create_backgrounds
custom_background=custom_background,
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.7/site-packages/gimmemotifs/denovo.py", line 226, in create_background
f = MatchedGcFasta(fafile, genome, nr_times len(fg))
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.7/site-packages/gimmemotifs/background.py", line 559, in init
matched_gc_bedfile(tmpbed, matchfile, genome, number, size=size)
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.7/site-packages/gimmemotifs/background.py", line 477, in matched_gc_bedfile
[float(x[fields + 1]) for x in bed.nucleotide_content(fi=genome_fa)]
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.7/site-packages/pybedtools/bedtool.py", line 917, in decorated
result = method(self, args, **kwargs)
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.7/site-packages/pybedtools/bedtool.py", line 401, in wrapped
decode_output=decode_output,
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.7/site-packages/pybedtools/helpers.py", line 455, in call_bedtools
raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:
Error message was: It looks as though you have less than 3 columns at line 1 in file Alluvial_open_GM.5.0.Forkhead.0008_gimmemotifs/intermediate/prediction.fa. Are you sure your files are tab-delimited?`
To Reproduce Run gimme motifs
Expected behavior For gimme motifs to run and read its own intermediate files
Error logs Error log: 2021-01-13 17:30:15,735 - gimme.config - DEBUG - Using multiprocessing 2021-01-13 17:30:15,736 - gimme.config - DEBUG - Parameters: 2021-01-13 17:30:15,736 - gimme.config - DEBUG - fraction: 0.2 2021-01-13 17:30:15,736 - gimme.config - DEBUG - use_strand: False 2021-01-13 17:30:15,736 - gimme.config - DEBUG - abs_max: 1000 2021-01-13 17:30:15,736 - gimme.config - DEBUG - analysis: xl 2021-01-13 17:30:15,736 - gimme.config - DEBUG - enrichment: 1.5 2021-01-13 17:30:15,736 - gimme.config - DEBUG - size: 0 2021-01-13 17:30:15,736 - gimme.config - DEBUG - lsize: 500 2021-01-13 17:30:15,736 - gimme.config - DEBUG - background: ['gc'] 2021-01-13 17:30:15,736 - gimme.config - DEBUG - cluster_threshold: 0.95 2021-01-13 17:30:15,736 - gimme.config - DEBUG - scan_cutoff: 0.9 2021-01-13 17:30:15,737 - gimme.config - DEBUG - available_tools: MDmodule,MEME,MEMEW,DREME,Weeder,GADEM,MotifSampler,Trawler,Improbizer,BioProspector,Posmo,ChIPMunk,AMD,HMS,Homer,XXmotif,ProSampler,DiNAMO 2021-01-13 17:30:15,737 - gimme.config - DEBUG - tools: MEME,Homer,BioProspector 2021-01-13 17:30:15,737 - gimme.config - DEBUG - pvalue: 0.001 2021-01-13 17:30:15,737 - gimme.config - DEBUG - max_time: -1 2021-01-13 17:30:15,737 - gimme.config - DEBUG - ncpus: 12 2021-01-13 17:30:15,737 - gimme.config - DEBUG - motif_db: gimme.vertebrate.v5.0.pfm 2021-01-13 17:30:15,737 - gimme.config - DEBUG - use_cache: False 2021-01-13 17:30:15,737 - gimme.config - DEBUG - custom_background: Alluvial_open_GM.5.0.Forkhead.0008_gimmemotifs/generated_background.gc.fa 2021-01-13 17:30:15,737 - gimme.config - DEBUG - genome: mm10 2021-01-13 17:30:15,737 - gimme.config - DEBUG - No time limit for motif prediction 2021-01-13 17:30:15,741 - gimme.denovo - INFO - starting full motif analysis 2021-01-13 17:30:15,741 - gimme.denovo - DEBUG - Using temporary directory /tmp/gimmemotifs.151283.xrc7uklp 2021-01-13 17:30:15,741 - gimme.denovo - INFO - using original size 2021-01-13 17:30:15,741 - gimme.denovo - INFO - preparing input from BED 2021-01-13 17:30:15,746 - gimme.denovo - DEBUG - Splitting Alluvial_open_GM.5.0.Forkhead.0008_gimmemotifs/intermediate/input.bed into prediction set (Alluvial_open_GM.5.0.Forkhead.0008_gimmemotifs/intermediate/prediction.bed) and validation set (Alluvial_open_GM.5.0.Forkhead.0008_gimmemotifs/intermediate/validation.bed) 2021-01-13 17:30:16,053 - gimme.denovo - DEBUG - Creating GC matched background
Installation information (please complete the following information):
Additional context Add any other context about the problem here.
Head of input bed file looks like this:
chr10 69165952 69166203 chr11 98750442 98751534 chr3 21895407 21895988 chr2 102815334 102815552 chr17 93484588 93484861 chr4 92547518 92547673 chr2 117637363 117637832 chr16 52000168 52000605 chr11 23401137 23401330 chr18 58143451 58143760`
Head of Alluvial_open_GM.5.0.Forkhead.0008_gimmemotifs/intermediate/prediction.fa looks like this:
Head of Alluvial_open_GM.5.0.Forkhead.0008_gimmemotifs/intermediate/localization.fa