vanheeringen-lab / gimmemotifs

Suite of motif tools, including a motif prediction pipeline for ChIP-seq experiments. See full GimmeMotifs documentation for detailed installation instructions and usage examples.
https://gimmemotifs.readthedocs.io/en/master
MIT License
110 stars 33 forks source link

Maelstrom not running #172

Closed connorrogerson closed 3 years ago

connorrogerson commented 3 years ago

Describe the bug Maelstrom errors when running with default parameters.

To Reproduce gimme maelstrom -N $SLURM_NTASKS /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_forkhead/maelstrom_forkhead_input.txt mm10 /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_forkhead/

Expected behavior I've ran maelstrom before with no errors. Expecting a similar results.

Error logs 2021-01-21 15:06:51,859 - INFO - Starting maelstrom 2021-01-21 15:06:52,021 - INFO - motif scanning (counts) 2021-01-21 15:06:52,034 - INFO - reading table 2021-01-21 15:07:34,118 - INFO - using 14000 sequences 2021-01-21 15:08:35,419 - INFO - setting threshold 2021-01-21 15:09:26,638 - INFO - determining FPR-based threshold 2021-01-21 15:14:38,091 - INFO - creating count table Traceback (most recent call last): File "/home/cjr78/miniconda3/envs/gimme/bin/gimme", line 11, in cli(sys.argv[1:]) File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/cli.py", line 661, in cli args.func(args) File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/commands/maelstrom.py", line 45, in maelstrom aggregation=aggregation, File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/maelstrom.py", line 350, in run_maelstrom gc=gc, File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/scanner.py", line 166, in scan_regionfile_to_table for row in s.count(regions): File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/scanner.py", line 991, in count for matches in self.scan(seqs, nreport, scan_rc): File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/scanner.py", line 1074, in scan seqs = as_fasta(seqs, genome=self.genome) File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/utils.py", line 696, in as_fasta return Fasta(fdict=as_seqdict(to_convert, genome, minsize)) File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/functools.py", line 807, in wrapper return dispatch(args[0].class)(*args, **kw) File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/utils.py", line 618, in _as_seqdict_list return _genomepy_convert(to_convert, genome, minsize) File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/utils.py", line 538, in _genomepy_convert g.track2fasta(to_convert, tmpfile.name) File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/genomepy/genome.py", line 361, in track2fasta track_type = self.get_track_type(track) File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/genomepy/genome.py", line 334, in get_track_type with open(track) as fin: TypeError: expected str, bytes or os.PathLike object, not list

Installation information (please complete the following information):

Additional context Add any other context about the problem here.

simonvh commented 3 years ago

Can you check which genomepy version you have in your env?

conda activate gimme
conda list | grep genomepy
connorrogerson commented 3 years ago

So the output is of the command is

genomepy 0.9.1 py_0 bioconda

simonvh commented 3 years ago

Hmm, seems to be a bug in genomepy. Can you show the first couple of lines of /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_forkhead/maelstrom_forkhead_input.txt?

connorrogerson commented 3 years ago

The file looks like this:

loc cluster chr7:132622968-132623384 Open chr15:10833861-10834376 Open chr5:108139250-108139415 Open chr10:59978846-59979189 Open chr6:56879480-56879638 Open chr2:61121364-61121617 Open chr15:34350609-34350811 Open chr2:20446043-20446656 Open chr15:80459584-80460058 Open

They should be all tab-separated...

connorrogerson commented 3 years ago

@simonvh has there been any update to this? Anything I can try on my side to try and sort this out?

simonvh commented 3 years ago

Sorry, crazy busy with teaching this quarter. I'll have another look. If possible, can you send me your whole input file by mail? Then I can check if I can localize the error.

simonvh commented 3 years ago

It seems there is an extra space in the first five lines of the file. If you remove these it should work. I have also added a more informative warning for the next version of GimmeMotifs.

connorrogerson commented 3 years ago

Hi Simon,

Thanks for noticing that! I've re-ran maelstrom with the new file and it seems to get stuck at the read table step:

gimme maelstrom maelstrom_forkhead/maelstrom_forkhead_input.txt mm10 maelstrom_test/ 2021-04-09 15:14:06,290 - INFO - Starting maelstrom 2021-04-09 15:14:06,493 - INFO - motif scanning (counts) 2021-04-09 15:14:06,506 - INFO - reading table

This hangs there for a very long time. Were you able to run the above command with my input file on your system OK?

Best wishes, Connor


From: Simon van Heeringen @.> Sent: 09 April 2021 14:38 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)

It seems there is an extra space in the first five lines of the file. If you remove these it should work. I have also added a more informative warning for the next version of GimmeMotifs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-816688654, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEA6WXGOULM7YNVAHCFB5ALTH37O5ANCNFSM4WNBC3PQ.

connorrogerson commented 3 years ago

Hi Simon,

Just wanted to update you on my last email. So maelstrom runs, but seems to hang at "read table" when you use the command. Over the weekend I submitted a job and the job timed out after 12 hours (Our CSF requires a time to allocate to the job).

The output to my script is: 2021-04-10 03:49:38,615 - INFO - Starting maelstrom 2021-04-10 03:49:38,833 - INFO - motif scanning (counts) 2021-04-10 03:49:38,837 - INFO - reading table 2021-04-10 09:50:55,687 - INFO - using 14000 sequences slurmstepd: error: JOB 37375628 ON cpu-e-792 CANCELLED AT 2021-04-10T15:41:46 DUE TO TIME LIMIT

It seems it took 6 hours to perform the "read table" step. When I've used maelstrom in the past, this has been a pretty quick step. This seems to occurs whichever motif database I use and whichever input I use. Any ideas?

Best wishes, Connor


From: Simon van Heeringen @.> Sent: 09 April 2021 14:38 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)

It seems there is an extra space in the first five lines of the file. If you remove these it should work. I have also added a more informative warning for the next version of GimmeMotifs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-816688654, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEA6WXGOULM7YNVAHCFB5ALTH37O5ANCNFSM4WNBC3PQ.

simonvh commented 3 years ago

Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom again?

connorrogerson commented 3 years ago

Hi Simon,

Sorry for the delay. It's taking a while to get long jobs running on our server.

I deleted the cache and ran the script again, but I still get into the same problem. Output for this job was: Using $XDG_CACHE_HOME for cache 2021-04-17 20:21:29,319 - INFO - Starting maelstrom 2021-04-17 20:21:29,560 - INFO - motif scanning (counts) 2021-04-17 20:21:29,567 - INFO - reading table 2021-04-18 02:12:19,760 - INFO - using 14000 sequences 2021-04-18 02:12:19,842 - INFO - Creating index for genomic GC frequencies. 2021-04-18 02:13:04,005 - INFO - setting threshold 2021-04-18 02:13:17,238 - INFO - determining FPR-based threshold 2021-04-18 02:13:35,309 - INFO - creating count table slurmstepd: error: JOB 37938935 ON cpu-e-1104 CANCELLED AT 2021-04-18T08:20:25 DUE TO TIME LIMIT

I still seems to be taking a very long time to read the table.

Is it worth putting this on github in issues?

Best wishes Connor


From: Simon van Heeringen @.> Sent: 13 April 2021 07:42 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)

Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom again?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-818482871, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEA6WXCDXZ3KPXGXSNVGS4DTIPRVXANCNFSM4WNBC3PQ.

simonvh commented 3 years ago

Which version of GimmeMotifs is this? I finally managed to convince the bioconda build system to create a functioning build, so there's a new version available. This may help? I have to confess that I 'm really not sure as to why this occurs. There are not a large number of regions in your file. This is with the maelstrom_forkhead_input.txt file you sent earlier?

Simon

On Mon, Apr 19, 2021 at 3:25 PM connorrogerson @.***> wrote:

Hi Simon,

Sorry for the delay. It's taking a while to get long jobs running on our server.

I deleted the cache and ran the script again, but I still get into the same problem. Output for this job was: Using $XDG_CACHE_HOME for cache 2021-04-17 20:21:29,319 - INFO - Starting maelstrom 2021-04-17 20:21:29,560 - INFO - motif scanning (counts) 2021-04-17 20:21:29,567 - INFO - reading table 2021-04-18 02:12:19,760 - INFO - using 14000 sequences 2021-04-18 02:12:19,842 - INFO - Creating index for genomic GC frequencies. 2021-04-18 02:13:04,005 - INFO - setting threshold 2021-04-18 02:13:17,238 - INFO - determining FPR-based threshold 2021-04-18 02:13:35,309 - INFO - creating count table slurmstepd: error: JOB 37938935 ON cpu-e-1104 CANCELLED AT 2021-04-18T08:20:25 DUE TO TIME LIMIT

I still seems to be taking a very long time to read the table.

Is it worth putting this on github in issues?

Best wishes Connor


From: Simon van Heeringen @.> Sent: 13 April 2021 07:42 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)

Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom again?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-818482871>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEA6WXCDXZ3KPXGXSNVGS4DTIPRVXANCNFSM4WNBC3PQ

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822464592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACEVJBNLELL6U4BY6F7LUDTJQVNJANCNFSM4WNBC3PQ .

simonvh commented 3 years ago

For me it just works :( Hard to debug...

2021-04-19 17:25:30,290 - INFO - Starting maelstrom 2021-04-19 17:25:30,314 - INFO - motif scanning (counts) 2021-04-19 17:25:30,314 - INFO - reading table 2021-04-19 17:25:32,935 - INFO - using 14000 sequences 2021-04-19 17:26:13,312 - INFO - setting threshold 2021-04-19 17:26:16,289 - INFO - determining FPR-based threshold 2021-04-19 17:31:02,124 - INFO - creating count table 2021-04-19 17:31:45,201 - INFO - done 2021-04-19 17:31:47,245 - INFO - creating dataframe 2021-04-19 17:31:49,568 - INFO - motif scanning (scores) 2021-04-19 17:31:49,628 - INFO - reading table 2021-04-19 17:31:53,756 - INFO - using 14000 sequences 2021-04-19 17:32:34,759 - INFO - creating score table (z-score, GC%) 2021-04-19 17:53:05,362 - INFO - done 2021-04-19 17:53:07,358 - INFO - creating dataframe 2021-04-19 17:53:28,693 - INFO - Selecting non-redundant motifs 2021-04-19 17:53:36,934 - INFO - Selected 327 motifs 2021-04-19 17:53:36,935 - INFO - Motifs: maelstrom.forkhead/nonredundant.motifs.pfm 2021-04-19 17:53:36,935 - INFO - Factor mappings: maelstrom.forkhead/nonredundant.motifs.motif2factors.txt 2021-04-19 17:53:37,129 - INFO - Fitting MWU 2021-04-19 17:53:37,800 - INFO - Done 2021-04-19 17:53:37,892 - INFO - Fitting Hypergeom 2021-04-19 17:53:38,267 - INFO - Done 2021-04-19 17:53:38,456 - INFO - Fitting RF 2021-04-19 17:53:39,304 - INFO - Done 2021-04-19 17:53:39,321 - INFO - Rank aggregation 2021-04-19 17:53:40,345 - INFO - html report 2021-04-19 17:53:46,575 - INFO - maelstrom.forkhead/gimme.maelstrom.report.html

One other thing to try: limiting the number of cores. After ~12 cores the overhead of multiprocessing starts to slow down the scanning, maybe that is going on here? I'm just grasping at straws.

On Mon, Apr 19, 2021 at 5:29 PM Simon van Heeringen < @.***> wrote:

Which version of GimmeMotifs is this? I finally managed to convince the bioconda build system to create a functioning build, so there's a new version available. This may help? I have to confess that I 'm really not sure as to why this occurs. There are not a large number of regions in your file. This is with the maelstrom_forkhead_input.txt file you sent earlier?

Simon

On Mon, Apr 19, 2021 at 3:25 PM connorrogerson @.***> wrote:

Hi Simon,

Sorry for the delay. It's taking a while to get long jobs running on our server.

I deleted the cache and ran the script again, but I still get into the same problem. Output for this job was: Using $XDG_CACHE_HOME for cache 2021-04-17 20:21:29,319 - INFO - Starting maelstrom 2021-04-17 20:21:29,560 - INFO - motif scanning (counts) 2021-04-17 20:21:29,567 - INFO - reading table 2021-04-18 02:12:19,760 - INFO - using 14000 sequences 2021-04-18 02:12:19,842 - INFO - Creating index for genomic GC frequencies. 2021-04-18 02:13:04,005 - INFO - setting threshold 2021-04-18 02:13:17,238 - INFO - determining FPR-based threshold 2021-04-18 02:13:35,309 - INFO - creating count table slurmstepd: error: JOB 37938935 ON cpu-e-1104 CANCELLED AT 2021-04-18T08:20:25 DUE TO TIME LIMIT

I still seems to be taking a very long time to read the table.

Is it worth putting this on github in issues?

Best wishes Connor


From: Simon van Heeringen @.> Sent: 13 April 2021 07:42 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)

Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom again?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-818482871>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEA6WXCDXZ3KPXGXSNVGS4DTIPRVXANCNFSM4WNBC3PQ

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822464592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACEVJBNLELL6U4BY6F7LUDTJQVNJANCNFSM4WNBC3PQ .

connorrogerson commented 3 years ago

Hi Simon,

I’ll try the core option. The version of gimme motifs I’m using is 0.15.3+13.gdd30eae (installed the dev version on GitHub).

Cheers, Connor

On 19 Apr 2021, at 16:58, Simon van Heeringen @.***> wrote:

For me it just works :( Hard to debug...

2021-04-19 17:25:30,290 - INFO - Starting maelstrom 2021-04-19 17:25:30,314 - INFO - motif scanning (counts) 2021-04-19 17:25:30,314 - INFO - reading table 2021-04-19 17:25:32,935 - INFO - using 14000 sequences 2021-04-19 17:26:13,312 - INFO - setting threshold 2021-04-19 17:26:16,289 - INFO - determining FPR-based threshold 2021-04-19 17:31:02,124 - INFO - creating count table 2021-04-19 17:31:45,201 - INFO - done 2021-04-19 17:31:47,245 - INFO - creating dataframe 2021-04-19 17:31:49,568 - INFO - motif scanning (scores) 2021-04-19 17:31:49,628 - INFO - reading table 2021-04-19 17:31:53,756 - INFO - using 14000 sequences 2021-04-19 17:32:34,759 - INFO - creating score table (z-score, GC%) 2021-04-19 17:53:05,362 - INFO - done 2021-04-19 17:53:07,358 - INFO - creating dataframe 2021-04-19 17:53:28,693 - INFO - Selecting non-redundant motifs 2021-04-19 17:53:36,934 - INFO - Selected 327 motifs 2021-04-19 17:53:36,935 - INFO - Motifs: maelstrom.forkhead/nonredundant.motifs.pfm 2021-04-19 17:53:36,935 - INFO - Factor mappings: maelstrom.forkhead/nonredundant.motifs.motif2factors.txt 2021-04-19 17:53:37,129 - INFO - Fitting MWU 2021-04-19 17:53:37,800 - INFO - Done 2021-04-19 17:53:37,892 - INFO - Fitting Hypergeom 2021-04-19 17:53:38,267 - INFO - Done 2021-04-19 17:53:38,456 - INFO - Fitting RF 2021-04-19 17:53:39,304 - INFO - Done 2021-04-19 17:53:39,321 - INFO - Rank aggregation 2021-04-19 17:53:40,345 - INFO - html report 2021-04-19 17:53:46,575 - INFO - maelstrom.forkhead/gimme.maelstrom.report.html

One other thing to try: limiting the number of cores. After ~12 cores the overhead of multiprocessing starts to slow down the scanning, maybe that is going on here? I'm just grasping at straws.

On Mon, Apr 19, 2021 at 5:29 PM Simon van Heeringen < @.***> wrote:

Which version of GimmeMotifs is this? I finally managed to convince the bioconda build system to create a functioning build, so there's a new version available. This may help? I have to confess that I 'm really not sure as to why this occurs. There are not a large number of regions in your file. This is with the maelstrom_forkhead_input.txt file you sent earlier?

Simon

On Mon, Apr 19, 2021 at 3:25 PM connorrogerson @.***> wrote:

Hi Simon,

Sorry for the delay. It's taking a while to get long jobs running on our server.

I deleted the cache and ran the script again, but I still get into the same problem. Output for this job was: Using $XDG_CACHE_HOME for cache 2021-04-17 20:21:29,319 - INFO - Starting maelstrom 2021-04-17 20:21:29,560 - INFO - motif scanning (counts) 2021-04-17 20:21:29,567 - INFO - reading table 2021-04-18 02:12:19,760 - INFO - using 14000 sequences 2021-04-18 02:12:19,842 - INFO - Creating index for genomic GC frequencies. 2021-04-18 02:13:04,005 - INFO - setting threshold 2021-04-18 02:13:17,238 - INFO - determining FPR-based threshold 2021-04-18 02:13:35,309 - INFO - creating count table slurmstepd: error: JOB 37938935 ON cpu-e-1104 CANCELLED AT 2021-04-18T08:20:25 DUE TO TIME LIMIT

I still seems to be taking a very long time to read the table.

Is it worth putting this on github in issues?

Best wishes Connor


From: Simon van Heeringen @.> Sent: 13 April 2021 07:42 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)

Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom again?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-818482871>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEA6WXCDXZ3KPXGXSNVGS4DTIPRVXANCNFSM4WNBC3PQ

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822464592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACEVJBNLELL6U4BY6F7LUDTJQVNJANCNFSM4WNBC3PQ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822582087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA6WXEFEHWAX6I6KSP57HDTJRHITANCNFSM4WNBC3PQ.

connorrogerson commented 3 years ago

Hi Simon,

So I decided to uninstall and reinstall conda and create a new environment but I I'm still getting the same problem.

Version of gimmemotifs is gimmemotifs 0.15.3 py37h21043fe_0 bioconda Version of genompy is genomepy 0.9.3 py_0 bioconda

I ran this on the command line but still get stuck at read table. I interrupted the run and this is the output for that, if this gives us any clues?

gimme maelstrom /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_forkhead/maelstrom_forkhead_input.txt mm10 /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_test/ 2021-04-20 11:58:19,104 - INFO - Starting maelstrom 2021-04-20 11:58:19,191 - INFO - motif scanning (counts) 2021-04-20 11:58:19,196 - INFO - reading table Traceback (most recent call last): File "/home/cjr78/miniconda3/envs/seq/bin/gimme", line 11, in cli(sys.argv[1:]) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/cli.py", line 661, in cli args.func(args) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/commands/maelstrom.py", line 45, in maelstrom aggregation=aggregation, File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/maelstrom.py", line 350, in run_maelstrom gc=gc, File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/scanner.py", line 154, in scan_regionfile_to_table np.median([len(seq) for seq in as_fasta(check_regions, genome=genome).seqs]) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/utils.py", line 705, in as_fasta return Fasta(fdict=as_seqdict(to_convert, genome, minsize)) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/functools.py", line 840, in wrapper return dispatch(args[0].class)(*args, *kw) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/utils.py", line 698, in _as_seqdict_array return as_seqdict(list(to_convert), genome, minsize) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/functools.py", line 840, in wrapper return dispatch(args[0].class)(args, **kw) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/utils.py", line 627, in _as_seqdict_list return _genomepy_convert(to_convert, genome, minsize) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/utils.py", line 547, in _genomepy_convert g.track2fasta(to_convert, tmpfile.name) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/genomepy/genome.py", line 373, in track2fasta for seq in seqqer: File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/genomepy/genome.py", line 313, in _regions_to_seqs seq = self._region_to_seq(name, extend_up, extend_down) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/genomepy/genome.py", line 306, in _region_to_seq seq = self.get_seq(chrom, start, end) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/pyfaidx/init.py", line 1046, in get_seq seq = self.faidx.fetch(name, start, end) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/pyfaidx/init.py", line 635, in fetch seq = self.from_file(name, start, end) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/pyfaidx/init.py", line 679, in from_file chunk_seq = self.file.read(chunk).decode() File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/Bio/bgzf.py", line 689, in read result += data KeyboardInterrupt

Cheers, Connor


From: Simon van Heeringen @.> Sent: 19 April 2021 16:58 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)

For me it just works :( Hard to debug...

2021-04-19 17:25:30,290 - INFO - Starting maelstrom 2021-04-19 17:25:30,314 - INFO - motif scanning (counts) 2021-04-19 17:25:30,314 - INFO - reading table 2021-04-19 17:25:32,935 - INFO - using 14000 sequences 2021-04-19 17:26:13,312 - INFO - setting threshold 2021-04-19 17:26:16,289 - INFO - determining FPR-based threshold 2021-04-19 17:31:02,124 - INFO - creating count table 2021-04-19 17:31:45,201 - INFO - done 2021-04-19 17:31:47,245 - INFO - creating dataframe 2021-04-19 17:31:49,568 - INFO - motif scanning (scores) 2021-04-19 17:31:49,628 - INFO - reading table 2021-04-19 17:31:53,756 - INFO - using 14000 sequences 2021-04-19 17:32:34,759 - INFO - creating score table (z-score, GC%) 2021-04-19 17:53:05,362 - INFO - done 2021-04-19 17:53:07,358 - INFO - creating dataframe 2021-04-19 17:53:28,693 - INFO - Selecting non-redundant motifs 2021-04-19 17:53:36,934 - INFO - Selected 327 motifs 2021-04-19 17:53:36,935 - INFO - Motifs: maelstrom.forkhead/nonredundant.motifs.pfm 2021-04-19 17:53:36,935 - INFO - Factor mappings: maelstrom.forkhead/nonredundant.motifs.motif2factors.txt 2021-04-19 17:53:37,129 - INFO - Fitting MWU 2021-04-19 17:53:37,800 - INFO - Done 2021-04-19 17:53:37,892 - INFO - Fitting Hypergeom 2021-04-19 17:53:38,267 - INFO - Done 2021-04-19 17:53:38,456 - INFO - Fitting RF 2021-04-19 17:53:39,304 - INFO - Done 2021-04-19 17:53:39,321 - INFO - Rank aggregation 2021-04-19 17:53:40,345 - INFO - html report 2021-04-19 17:53:46,575 - INFO - maelstrom.forkhead/gimme.maelstrom.report.html

One other thing to try: limiting the number of cores. After ~12 cores the overhead of multiprocessing starts to slow down the scanning, maybe that is going on here? I'm just grasping at straws.

On Mon, Apr 19, 2021 at 5:29 PM Simon van Heeringen < @.***> wrote:

Which version of GimmeMotifs is this? I finally managed to convince the bioconda build system to create a functioning build, so there's a new version available. This may help? I have to confess that I 'm really not sure as to why this occurs. There are not a large number of regions in your file. This is with the maelstrom_forkhead_input.txt file you sent earlier?

Simon

On Mon, Apr 19, 2021 at 3:25 PM connorrogerson @.***> wrote:

Hi Simon,

Sorry for the delay. It's taking a while to get long jobs running on our server.

I deleted the cache and ran the script again, but I still get into the same problem. Output for this job was: Using $XDG_CACHE_HOME for cache 2021-04-17 20:21:29,319 - INFO - Starting maelstrom 2021-04-17 20:21:29,560 - INFO - motif scanning (counts) 2021-04-17 20:21:29,567 - INFO - reading table 2021-04-18 02:12:19,760 - INFO - using 14000 sequences 2021-04-18 02:12:19,842 - INFO - Creating index for genomic GC frequencies. 2021-04-18 02:13:04,005 - INFO - setting threshold 2021-04-18 02:13:17,238 - INFO - determining FPR-based threshold 2021-04-18 02:13:35,309 - INFO - creating count table slurmstepd: error: JOB 37938935 ON cpu-e-1104 CANCELLED AT 2021-04-18T08:20:25 DUE TO TIME LIMIT

I still seems to be taking a very long time to read the table.

Is it worth putting this on github in issues?

Best wishes Connor


From: Simon van Heeringen @.> Sent: 13 April 2021 07:42 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)

Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom again?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-818482871>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEA6WXCDXZ3KPXGXSNVGS4DTIPRVXANCNFSM4WNBC3PQ

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822464592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACEVJBNLELL6U4BY6F7LUDTJQVNJANCNFSM4WNBC3PQ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822582087, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEA6WXEFEHWAX6I6KSP57HDTJRHITANCNFSM4WNBC3PQ.

connorrogerson commented 3 years ago

Hi Simon,

One thing that could help, if you list the packages you have installed in your gimme environment? I can the cross-reference to mine and see if there are any differences?

Cheers, Connor


From: Connor Rogerson @.> Sent: 20 April 2021 12:14 To: vanheeringen-lab/gimmemotifs @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)

Hi Simon,

So I decided to uninstall and reinstall conda and create a new environment but I I'm still getting the same problem.

Version of gimmemotifs is gimmemotifs 0.15.3 py37h21043fe_0 bioconda Version of genompy is genomepy 0.9.3 py_0 bioconda

I ran this on the command line but still get stuck at read table. I interrupted the run and this is the output for that, if this gives us any clues?

gimme maelstrom /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_forkhead/maelstrom_forkhead_input.txt mm10 /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_test/ 2021-04-20 11:58:19,104 - INFO - Starting maelstrom 2021-04-20 11:58:19,191 - INFO - motif scanning (counts) 2021-04-20 11:58:19,196 - INFO - reading table Traceback (most recent call last): File "/home/cjr78/miniconda3/envs/seq/bin/gimme", line 11, in cli(sys.argv[1:]) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/cli.py", line 661, in cli args.func(args) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/commands/maelstrom.py", line 45, in maelstrom aggregation=aggregation, File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/maelstrom.py", line 350, in run_maelstrom gc=gc, File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/scanner.py", line 154, in scan_regionfile_to_table np.median([len(seq) for seq in as_fasta(check_regions, genome=genome).seqs]) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/utils.py", line 705, in as_fasta return Fasta(fdict=as_seqdict(to_convert, genome, minsize)) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/functools.py", line 840, in wrapper return dispatch(args[0].class)(*args, *kw) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/utils.py", line 698, in _as_seqdict_array return as_seqdict(list(to_convert), genome, minsize) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/functools.py", line 840, in wrapper return dispatch(args[0].class)(args, **kw) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/utils.py", line 627, in _as_seqdict_list return _genomepy_convert(to_convert, genome, minsize) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/gimmemotifs/utils.py", line 547, in _genomepy_convert g.track2fasta(to_convert, tmpfile.name) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/genomepy/genome.py", line 373, in track2fasta for seq in seqqer: File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/genomepy/genome.py", line 313, in _regions_to_seqs seq = self._region_to_seq(name, extend_up, extend_down) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/genomepy/genome.py", line 306, in _region_to_seq seq = self.get_seq(chrom, start, end) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/pyfaidx/init.py", line 1046, in get_seq seq = self.faidx.fetch(name, start, end) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/pyfaidx/init.py", line 635, in fetch seq = self.from_file(name, start, end) File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/pyfaidx/init.py", line 679, in from_file chunk_seq = self.file.read(chunk).decode() File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/Bio/bgzf.py", line 689, in read result += data KeyboardInterrupt

Cheers, Connor


From: Simon van Heeringen @.> Sent: 19 April 2021 16:58 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)

For me it just works :( Hard to debug...

2021-04-19 17:25:30,290 - INFO - Starting maelstrom 2021-04-19 17:25:30,314 - INFO - motif scanning (counts) 2021-04-19 17:25:30,314 - INFO - reading table 2021-04-19 17:25:32,935 - INFO - using 14000 sequences 2021-04-19 17:26:13,312 - INFO - setting threshold 2021-04-19 17:26:16,289 - INFO - determining FPR-based threshold 2021-04-19 17:31:02,124 - INFO - creating count table 2021-04-19 17:31:45,201 - INFO - done 2021-04-19 17:31:47,245 - INFO - creating dataframe 2021-04-19 17:31:49,568 - INFO - motif scanning (scores) 2021-04-19 17:31:49,628 - INFO - reading table 2021-04-19 17:31:53,756 - INFO - using 14000 sequences 2021-04-19 17:32:34,759 - INFO - creating score table (z-score, GC%) 2021-04-19 17:53:05,362 - INFO - done 2021-04-19 17:53:07,358 - INFO - creating dataframe 2021-04-19 17:53:28,693 - INFO - Selecting non-redundant motifs 2021-04-19 17:53:36,934 - INFO - Selected 327 motifs 2021-04-19 17:53:36,935 - INFO - Motifs: maelstrom.forkhead/nonredundant.motifs.pfm 2021-04-19 17:53:36,935 - INFO - Factor mappings: maelstrom.forkhead/nonredundant.motifs.motif2factors.txt 2021-04-19 17:53:37,129 - INFO - Fitting MWU 2021-04-19 17:53:37,800 - INFO - Done 2021-04-19 17:53:37,892 - INFO - Fitting Hypergeom 2021-04-19 17:53:38,267 - INFO - Done 2021-04-19 17:53:38,456 - INFO - Fitting RF 2021-04-19 17:53:39,304 - INFO - Done 2021-04-19 17:53:39,321 - INFO - Rank aggregation 2021-04-19 17:53:40,345 - INFO - html report 2021-04-19 17:53:46,575 - INFO - maelstrom.forkhead/gimme.maelstrom.report.html

One other thing to try: limiting the number of cores. After ~12 cores the overhead of multiprocessing starts to slow down the scanning, maybe that is going on here? I'm just grasping at straws.

On Mon, Apr 19, 2021 at 5:29 PM Simon van Heeringen < @.***> wrote:

Which version of GimmeMotifs is this? I finally managed to convince the bioconda build system to create a functioning build, so there's a new version available. This may help? I have to confess that I 'm really not sure as to why this occurs. There are not a large number of regions in your file. This is with the maelstrom_forkhead_input.txt file you sent earlier?

Simon

On Mon, Apr 19, 2021 at 3:25 PM connorrogerson @.***> wrote:

Hi Simon,

Sorry for the delay. It's taking a while to get long jobs running on our server.

I deleted the cache and ran the script again, but I still get into the same problem. Output for this job was: Using $XDG_CACHE_HOME for cache 2021-04-17 20:21:29,319 - INFO - Starting maelstrom 2021-04-17 20:21:29,560 - INFO - motif scanning (counts) 2021-04-17 20:21:29,567 - INFO - reading table 2021-04-18 02:12:19,760 - INFO - using 14000 sequences 2021-04-18 02:12:19,842 - INFO - Creating index for genomic GC frequencies. 2021-04-18 02:13:04,005 - INFO - setting threshold 2021-04-18 02:13:17,238 - INFO - determining FPR-based threshold 2021-04-18 02:13:35,309 - INFO - creating count table slurmstepd: error: JOB 37938935 ON cpu-e-1104 CANCELLED AT 2021-04-18T08:20:25 DUE TO TIME LIMIT

I still seems to be taking a very long time to read the table.

Is it worth putting this on github in issues?

Best wishes Connor


From: Simon van Heeringen @.> Sent: 13 April 2021 07:42 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)

Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom again?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-818482871>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEA6WXCDXZ3KPXGXSNVGS4DTIPRVXANCNFSM4WNBC3PQ

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822464592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACEVJBNLELL6U4BY6F7LUDTJQVNJANCNFSM4WNBC3PQ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822582087, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEA6WXEFEHWAX6I6KSP57HDTJRHITANCNFSM4WNBC3PQ.

connorrogerson commented 3 years ago

@simonvh tried to see whether it's because of python3.7 (as per some similar errors). Installed new environment specifying python=3.6, but still having the same problem.

connorrogerson commented 3 years ago

@simonvh this has seemed to have been fixed with upgrade to 0.16.0.

simonvh commented 3 years ago

I'm glad to hear that! Still puzzled regarding the cause. I'm sorry I could not be of more help solving this earlier :(