Closed connorrogerson closed 3 years ago
Can you check which genomepy version you have in your env?
conda activate gimme
conda list | grep genomepy
So the output is of the command is
genomepy 0.9.1 py_0 bioconda
Hmm, seems to be a bug in genomepy. Can you show the first couple of lines of /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_forkhead/maelstrom_forkhead_input.txt
?
The file looks like this:
loc cluster chr7:132622968-132623384 Open chr15:10833861-10834376 Open chr5:108139250-108139415 Open chr10:59978846-59979189 Open chr6:56879480-56879638 Open chr2:61121364-61121617 Open chr15:34350609-34350811 Open chr2:20446043-20446656 Open chr15:80459584-80460058 Open
They should be all tab-separated...
@simonvh has there been any update to this? Anything I can try on my side to try and sort this out?
Sorry, crazy busy with teaching this quarter. I'll have another look. If possible, can you send me your whole input file by mail? Then I can check if I can localize the error.
It seems there is an extra space in the first five lines of the file. If you remove these it should work. I have also added a more informative warning for the next version of GimmeMotifs.
Hi Simon,
Thanks for noticing that! I've re-ran maelstrom with the new file and it seems to get stuck at the read table step:
gimme maelstrom maelstrom_forkhead/maelstrom_forkhead_input.txt mm10 maelstrom_test/ 2021-04-09 15:14:06,290 - INFO - Starting maelstrom 2021-04-09 15:14:06,493 - INFO - motif scanning (counts) 2021-04-09 15:14:06,506 - INFO - reading table
This hangs there for a very long time. Were you able to run the above command with my input file on your system OK?
Best wishes, Connor
From: Simon van Heeringen @.> Sent: 09 April 2021 14:38 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)
It seems there is an extra space in the first five lines of the file. If you remove these it should work. I have also added a more informative warning for the next version of GimmeMotifs.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-816688654, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEA6WXGOULM7YNVAHCFB5ALTH37O5ANCNFSM4WNBC3PQ.
Hi Simon,
Just wanted to update you on my last email. So maelstrom runs, but seems to hang at "read table" when you use the command. Over the weekend I submitted a job and the job timed out after 12 hours (Our CSF requires a time to allocate to the job).
The output to my script is: 2021-04-10 03:49:38,615 - INFO - Starting maelstrom 2021-04-10 03:49:38,833 - INFO - motif scanning (counts) 2021-04-10 03:49:38,837 - INFO - reading table 2021-04-10 09:50:55,687 - INFO - using 14000 sequences slurmstepd: error: JOB 37375628 ON cpu-e-792 CANCELLED AT 2021-04-10T15:41:46 DUE TO TIME LIMIT
It seems it took 6 hours to perform the "read table" step. When I've used maelstrom in the past, this has been a pretty quick step. This seems to occurs whichever motif database I use and whichever input I use. Any ideas?
Best wishes, Connor
From: Simon van Heeringen @.> Sent: 09 April 2021 14:38 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)
It seems there is an extra space in the first five lines of the file. If you remove these it should work. I have also added a more informative warning for the next version of GimmeMotifs.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-816688654, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEA6WXGOULM7YNVAHCFB5ALTH37O5ANCNFSM4WNBC3PQ.
Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom
again?
Hi Simon,
Sorry for the delay. It's taking a while to get long jobs running on our server.
I deleted the cache and ran the script again, but I still get into the same problem. Output for this job was: Using $XDG_CACHE_HOME for cache 2021-04-17 20:21:29,319 - INFO - Starting maelstrom 2021-04-17 20:21:29,560 - INFO - motif scanning (counts) 2021-04-17 20:21:29,567 - INFO - reading table 2021-04-18 02:12:19,760 - INFO - using 14000 sequences 2021-04-18 02:12:19,842 - INFO - Creating index for genomic GC frequencies. 2021-04-18 02:13:04,005 - INFO - setting threshold 2021-04-18 02:13:17,238 - INFO - determining FPR-based threshold 2021-04-18 02:13:35,309 - INFO - creating count table slurmstepd: error: JOB 37938935 ON cpu-e-1104 CANCELLED AT 2021-04-18T08:20:25 DUE TO TIME LIMIT
I still seems to be taking a very long time to read the table.
Is it worth putting this on github in issues?
Best wishes Connor
From: Simon van Heeringen @.> Sent: 13 April 2021 07:42 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)
Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom again?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-818482871, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEA6WXCDXZ3KPXGXSNVGS4DTIPRVXANCNFSM4WNBC3PQ.
Which version of GimmeMotifs is this? I finally managed to convince the bioconda build system to create a functioning build, so there's a new version available. This may help? I have to confess that I 'm really not sure as to why this occurs. There are not a large number of regions in your file. This is with the maelstrom_forkhead_input.txt file you sent earlier?
Simon
On Mon, Apr 19, 2021 at 3:25 PM connorrogerson @.***> wrote:
Hi Simon,
Sorry for the delay. It's taking a while to get long jobs running on our server.
I deleted the cache and ran the script again, but I still get into the same problem. Output for this job was: Using $XDG_CACHE_HOME for cache 2021-04-17 20:21:29,319 - INFO - Starting maelstrom 2021-04-17 20:21:29,560 - INFO - motif scanning (counts) 2021-04-17 20:21:29,567 - INFO - reading table 2021-04-18 02:12:19,760 - INFO - using 14000 sequences 2021-04-18 02:12:19,842 - INFO - Creating index for genomic GC frequencies. 2021-04-18 02:13:04,005 - INFO - setting threshold 2021-04-18 02:13:17,238 - INFO - determining FPR-based threshold 2021-04-18 02:13:35,309 - INFO - creating count table slurmstepd: error: JOB 37938935 ON cpu-e-1104 CANCELLED AT 2021-04-18T08:20:25 DUE TO TIME LIMIT
I still seems to be taking a very long time to read the table.
Is it worth putting this on github in issues?
Best wishes Connor
From: Simon van Heeringen @.> Sent: 13 April 2021 07:42 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)
Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom again?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-818482871>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEA6WXCDXZ3KPXGXSNVGS4DTIPRVXANCNFSM4WNBC3PQ
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822464592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACEVJBNLELL6U4BY6F7LUDTJQVNJANCNFSM4WNBC3PQ .
For me it just works :( Hard to debug...
2021-04-19 17:25:30,290 - INFO - Starting maelstrom 2021-04-19 17:25:30,314 - INFO - motif scanning (counts) 2021-04-19 17:25:30,314 - INFO - reading table 2021-04-19 17:25:32,935 - INFO - using 14000 sequences 2021-04-19 17:26:13,312 - INFO - setting threshold 2021-04-19 17:26:16,289 - INFO - determining FPR-based threshold 2021-04-19 17:31:02,124 - INFO - creating count table 2021-04-19 17:31:45,201 - INFO - done 2021-04-19 17:31:47,245 - INFO - creating dataframe 2021-04-19 17:31:49,568 - INFO - motif scanning (scores) 2021-04-19 17:31:49,628 - INFO - reading table 2021-04-19 17:31:53,756 - INFO - using 14000 sequences 2021-04-19 17:32:34,759 - INFO - creating score table (z-score, GC%) 2021-04-19 17:53:05,362 - INFO - done 2021-04-19 17:53:07,358 - INFO - creating dataframe 2021-04-19 17:53:28,693 - INFO - Selecting non-redundant motifs 2021-04-19 17:53:36,934 - INFO - Selected 327 motifs 2021-04-19 17:53:36,935 - INFO - Motifs: maelstrom.forkhead/nonredundant.motifs.pfm 2021-04-19 17:53:36,935 - INFO - Factor mappings: maelstrom.forkhead/nonredundant.motifs.motif2factors.txt 2021-04-19 17:53:37,129 - INFO - Fitting MWU 2021-04-19 17:53:37,800 - INFO - Done 2021-04-19 17:53:37,892 - INFO - Fitting Hypergeom 2021-04-19 17:53:38,267 - INFO - Done 2021-04-19 17:53:38,456 - INFO - Fitting RF 2021-04-19 17:53:39,304 - INFO - Done 2021-04-19 17:53:39,321 - INFO - Rank aggregation 2021-04-19 17:53:40,345 - INFO - html report 2021-04-19 17:53:46,575 - INFO - maelstrom.forkhead/gimme.maelstrom.report.html
One other thing to try: limiting the number of cores. After ~12 cores the overhead of multiprocessing starts to slow down the scanning, maybe that is going on here? I'm just grasping at straws.
On Mon, Apr 19, 2021 at 5:29 PM Simon van Heeringen < @.***> wrote:
Which version of GimmeMotifs is this? I finally managed to convince the bioconda build system to create a functioning build, so there's a new version available. This may help? I have to confess that I 'm really not sure as to why this occurs. There are not a large number of regions in your file. This is with the maelstrom_forkhead_input.txt file you sent earlier?
Simon
On Mon, Apr 19, 2021 at 3:25 PM connorrogerson @.***> wrote:
Hi Simon,
Sorry for the delay. It's taking a while to get long jobs running on our server.
I deleted the cache and ran the script again, but I still get into the same problem. Output for this job was: Using $XDG_CACHE_HOME for cache 2021-04-17 20:21:29,319 - INFO - Starting maelstrom 2021-04-17 20:21:29,560 - INFO - motif scanning (counts) 2021-04-17 20:21:29,567 - INFO - reading table 2021-04-18 02:12:19,760 - INFO - using 14000 sequences 2021-04-18 02:12:19,842 - INFO - Creating index for genomic GC frequencies. 2021-04-18 02:13:04,005 - INFO - setting threshold 2021-04-18 02:13:17,238 - INFO - determining FPR-based threshold 2021-04-18 02:13:35,309 - INFO - creating count table slurmstepd: error: JOB 37938935 ON cpu-e-1104 CANCELLED AT 2021-04-18T08:20:25 DUE TO TIME LIMIT
I still seems to be taking a very long time to read the table.
Is it worth putting this on github in issues?
Best wishes Connor
From: Simon van Heeringen @.> Sent: 13 April 2021 07:42 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)
Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom again?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-818482871>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEA6WXCDXZ3KPXGXSNVGS4DTIPRVXANCNFSM4WNBC3PQ
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822464592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACEVJBNLELL6U4BY6F7LUDTJQVNJANCNFSM4WNBC3PQ .
Hi Simon,
I’ll try the core option. The version of gimme motifs I’m using is 0.15.3+13.gdd30eae (installed the dev version on GitHub).
Cheers, Connor
On 19 Apr 2021, at 16:58, Simon van Heeringen @.***> wrote:
For me it just works :( Hard to debug...
2021-04-19 17:25:30,290 - INFO - Starting maelstrom 2021-04-19 17:25:30,314 - INFO - motif scanning (counts) 2021-04-19 17:25:30,314 - INFO - reading table 2021-04-19 17:25:32,935 - INFO - using 14000 sequences 2021-04-19 17:26:13,312 - INFO - setting threshold 2021-04-19 17:26:16,289 - INFO - determining FPR-based threshold 2021-04-19 17:31:02,124 - INFO - creating count table 2021-04-19 17:31:45,201 - INFO - done 2021-04-19 17:31:47,245 - INFO - creating dataframe 2021-04-19 17:31:49,568 - INFO - motif scanning (scores) 2021-04-19 17:31:49,628 - INFO - reading table 2021-04-19 17:31:53,756 - INFO - using 14000 sequences 2021-04-19 17:32:34,759 - INFO - creating score table (z-score, GC%) 2021-04-19 17:53:05,362 - INFO - done 2021-04-19 17:53:07,358 - INFO - creating dataframe 2021-04-19 17:53:28,693 - INFO - Selecting non-redundant motifs 2021-04-19 17:53:36,934 - INFO - Selected 327 motifs 2021-04-19 17:53:36,935 - INFO - Motifs: maelstrom.forkhead/nonredundant.motifs.pfm 2021-04-19 17:53:36,935 - INFO - Factor mappings: maelstrom.forkhead/nonredundant.motifs.motif2factors.txt 2021-04-19 17:53:37,129 - INFO - Fitting MWU 2021-04-19 17:53:37,800 - INFO - Done 2021-04-19 17:53:37,892 - INFO - Fitting Hypergeom 2021-04-19 17:53:38,267 - INFO - Done 2021-04-19 17:53:38,456 - INFO - Fitting RF 2021-04-19 17:53:39,304 - INFO - Done 2021-04-19 17:53:39,321 - INFO - Rank aggregation 2021-04-19 17:53:40,345 - INFO - html report 2021-04-19 17:53:46,575 - INFO - maelstrom.forkhead/gimme.maelstrom.report.html
One other thing to try: limiting the number of cores. After ~12 cores the overhead of multiprocessing starts to slow down the scanning, maybe that is going on here? I'm just grasping at straws.
On Mon, Apr 19, 2021 at 5:29 PM Simon van Heeringen < @.***> wrote:
Which version of GimmeMotifs is this? I finally managed to convince the bioconda build system to create a functioning build, so there's a new version available. This may help? I have to confess that I 'm really not sure as to why this occurs. There are not a large number of regions in your file. This is with the maelstrom_forkhead_input.txt file you sent earlier?
Simon
On Mon, Apr 19, 2021 at 3:25 PM connorrogerson @.***> wrote:
Hi Simon,
Sorry for the delay. It's taking a while to get long jobs running on our server.
I deleted the cache and ran the script again, but I still get into the same problem. Output for this job was: Using $XDG_CACHE_HOME for cache 2021-04-17 20:21:29,319 - INFO - Starting maelstrom 2021-04-17 20:21:29,560 - INFO - motif scanning (counts) 2021-04-17 20:21:29,567 - INFO - reading table 2021-04-18 02:12:19,760 - INFO - using 14000 sequences 2021-04-18 02:12:19,842 - INFO - Creating index for genomic GC frequencies. 2021-04-18 02:13:04,005 - INFO - setting threshold 2021-04-18 02:13:17,238 - INFO - determining FPR-based threshold 2021-04-18 02:13:35,309 - INFO - creating count table slurmstepd: error: JOB 37938935 ON cpu-e-1104 CANCELLED AT 2021-04-18T08:20:25 DUE TO TIME LIMIT
I still seems to be taking a very long time to read the table.
Is it worth putting this on github in issues?
Best wishes Connor
From: Simon van Heeringen @.> Sent: 13 April 2021 07:42 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)
Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom again?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-818482871>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEA6WXCDXZ3KPXGXSNVGS4DTIPRVXANCNFSM4WNBC3PQ
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822464592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACEVJBNLELL6U4BY6F7LUDTJQVNJANCNFSM4WNBC3PQ .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822582087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA6WXEFEHWAX6I6KSP57HDTJRHITANCNFSM4WNBC3PQ.
Hi Simon,
So I decided to uninstall and reinstall conda and create a new environment but I I'm still getting the same problem.
Version of gimmemotifs is gimmemotifs 0.15.3 py37h21043fe_0 bioconda Version of genompy is genomepy 0.9.3 py_0 bioconda
I ran this on the command line but still get stuck at read table. I interrupted the run and this is the output for that, if this gives us any clues?
gimme maelstrom /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_forkhead/maelstrom_forkhead_input.txt mm10 /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_test/
2021-04-20 11:58:19,104 - INFO - Starting maelstrom
2021-04-20 11:58:19,191 - INFO - motif scanning (counts)
2021-04-20 11:58:19,196 - INFO - reading table
Traceback (most recent call last):
File "/home/cjr78/miniconda3/envs/seq/bin/gimme", line 11, in
Cheers, Connor
From: Simon van Heeringen @.> Sent: 19 April 2021 16:58 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)
For me it just works :( Hard to debug...
2021-04-19 17:25:30,290 - INFO - Starting maelstrom 2021-04-19 17:25:30,314 - INFO - motif scanning (counts) 2021-04-19 17:25:30,314 - INFO - reading table 2021-04-19 17:25:32,935 - INFO - using 14000 sequences 2021-04-19 17:26:13,312 - INFO - setting threshold 2021-04-19 17:26:16,289 - INFO - determining FPR-based threshold 2021-04-19 17:31:02,124 - INFO - creating count table 2021-04-19 17:31:45,201 - INFO - done 2021-04-19 17:31:47,245 - INFO - creating dataframe 2021-04-19 17:31:49,568 - INFO - motif scanning (scores) 2021-04-19 17:31:49,628 - INFO - reading table 2021-04-19 17:31:53,756 - INFO - using 14000 sequences 2021-04-19 17:32:34,759 - INFO - creating score table (z-score, GC%) 2021-04-19 17:53:05,362 - INFO - done 2021-04-19 17:53:07,358 - INFO - creating dataframe 2021-04-19 17:53:28,693 - INFO - Selecting non-redundant motifs 2021-04-19 17:53:36,934 - INFO - Selected 327 motifs 2021-04-19 17:53:36,935 - INFO - Motifs: maelstrom.forkhead/nonredundant.motifs.pfm 2021-04-19 17:53:36,935 - INFO - Factor mappings: maelstrom.forkhead/nonredundant.motifs.motif2factors.txt 2021-04-19 17:53:37,129 - INFO - Fitting MWU 2021-04-19 17:53:37,800 - INFO - Done 2021-04-19 17:53:37,892 - INFO - Fitting Hypergeom 2021-04-19 17:53:38,267 - INFO - Done 2021-04-19 17:53:38,456 - INFO - Fitting RF 2021-04-19 17:53:39,304 - INFO - Done 2021-04-19 17:53:39,321 - INFO - Rank aggregation 2021-04-19 17:53:40,345 - INFO - html report 2021-04-19 17:53:46,575 - INFO - maelstrom.forkhead/gimme.maelstrom.report.html
One other thing to try: limiting the number of cores. After ~12 cores the overhead of multiprocessing starts to slow down the scanning, maybe that is going on here? I'm just grasping at straws.
On Mon, Apr 19, 2021 at 5:29 PM Simon van Heeringen < @.***> wrote:
Which version of GimmeMotifs is this? I finally managed to convince the bioconda build system to create a functioning build, so there's a new version available. This may help? I have to confess that I 'm really not sure as to why this occurs. There are not a large number of regions in your file. This is with the maelstrom_forkhead_input.txt file you sent earlier?
Simon
On Mon, Apr 19, 2021 at 3:25 PM connorrogerson @.***> wrote:
Hi Simon,
Sorry for the delay. It's taking a while to get long jobs running on our server.
I deleted the cache and ran the script again, but I still get into the same problem. Output for this job was: Using $XDG_CACHE_HOME for cache 2021-04-17 20:21:29,319 - INFO - Starting maelstrom 2021-04-17 20:21:29,560 - INFO - motif scanning (counts) 2021-04-17 20:21:29,567 - INFO - reading table 2021-04-18 02:12:19,760 - INFO - using 14000 sequences 2021-04-18 02:12:19,842 - INFO - Creating index for genomic GC frequencies. 2021-04-18 02:13:04,005 - INFO - setting threshold 2021-04-18 02:13:17,238 - INFO - determining FPR-based threshold 2021-04-18 02:13:35,309 - INFO - creating count table slurmstepd: error: JOB 37938935 ON cpu-e-1104 CANCELLED AT 2021-04-18T08:20:25 DUE TO TIME LIMIT
I still seems to be taking a very long time to read the table.
Is it worth putting this on github in issues?
Best wishes Connor
From: Simon van Heeringen @.> Sent: 13 April 2021 07:42 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)
Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom again?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-818482871>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEA6WXCDXZ3KPXGXSNVGS4DTIPRVXANCNFSM4WNBC3PQ
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822464592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACEVJBNLELL6U4BY6F7LUDTJQVNJANCNFSM4WNBC3PQ .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822582087, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEA6WXEFEHWAX6I6KSP57HDTJRHITANCNFSM4WNBC3PQ.
Hi Simon,
One thing that could help, if you list the packages you have installed in your gimme environment? I can the cross-reference to mine and see if there are any differences?
Cheers, Connor
From: Connor Rogerson @.> Sent: 20 April 2021 12:14 To: vanheeringen-lab/gimmemotifs @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)
Hi Simon,
So I decided to uninstall and reinstall conda and create a new environment but I I'm still getting the same problem.
Version of gimmemotifs is gimmemotifs 0.15.3 py37h21043fe_0 bioconda Version of genompy is genomepy 0.9.3 py_0 bioconda
I ran this on the command line but still get stuck at read table. I interrupted the run and this is the output for that, if this gives us any clues?
gimme maelstrom /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_forkhead/maelstrom_forkhead_input.txt mm10 /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_test/
2021-04-20 11:58:19,104 - INFO - Starting maelstrom
2021-04-20 11:58:19,191 - INFO - motif scanning (counts)
2021-04-20 11:58:19,196 - INFO - reading table
Traceback (most recent call last):
File "/home/cjr78/miniconda3/envs/seq/bin/gimme", line 11, in
Cheers, Connor
From: Simon van Heeringen @.> Sent: 19 April 2021 16:58 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)
For me it just works :( Hard to debug...
2021-04-19 17:25:30,290 - INFO - Starting maelstrom 2021-04-19 17:25:30,314 - INFO - motif scanning (counts) 2021-04-19 17:25:30,314 - INFO - reading table 2021-04-19 17:25:32,935 - INFO - using 14000 sequences 2021-04-19 17:26:13,312 - INFO - setting threshold 2021-04-19 17:26:16,289 - INFO - determining FPR-based threshold 2021-04-19 17:31:02,124 - INFO - creating count table 2021-04-19 17:31:45,201 - INFO - done 2021-04-19 17:31:47,245 - INFO - creating dataframe 2021-04-19 17:31:49,568 - INFO - motif scanning (scores) 2021-04-19 17:31:49,628 - INFO - reading table 2021-04-19 17:31:53,756 - INFO - using 14000 sequences 2021-04-19 17:32:34,759 - INFO - creating score table (z-score, GC%) 2021-04-19 17:53:05,362 - INFO - done 2021-04-19 17:53:07,358 - INFO - creating dataframe 2021-04-19 17:53:28,693 - INFO - Selecting non-redundant motifs 2021-04-19 17:53:36,934 - INFO - Selected 327 motifs 2021-04-19 17:53:36,935 - INFO - Motifs: maelstrom.forkhead/nonredundant.motifs.pfm 2021-04-19 17:53:36,935 - INFO - Factor mappings: maelstrom.forkhead/nonredundant.motifs.motif2factors.txt 2021-04-19 17:53:37,129 - INFO - Fitting MWU 2021-04-19 17:53:37,800 - INFO - Done 2021-04-19 17:53:37,892 - INFO - Fitting Hypergeom 2021-04-19 17:53:38,267 - INFO - Done 2021-04-19 17:53:38,456 - INFO - Fitting RF 2021-04-19 17:53:39,304 - INFO - Done 2021-04-19 17:53:39,321 - INFO - Rank aggregation 2021-04-19 17:53:40,345 - INFO - html report 2021-04-19 17:53:46,575 - INFO - maelstrom.forkhead/gimme.maelstrom.report.html
One other thing to try: limiting the number of cores. After ~12 cores the overhead of multiprocessing starts to slow down the scanning, maybe that is going on here? I'm just grasping at straws.
On Mon, Apr 19, 2021 at 5:29 PM Simon van Heeringen < @.***> wrote:
Which version of GimmeMotifs is this? I finally managed to convince the bioconda build system to create a functioning build, so there's a new version available. This may help? I have to confess that I 'm really not sure as to why this occurs. There are not a large number of regions in your file. This is with the maelstrom_forkhead_input.txt file you sent earlier?
Simon
On Mon, Apr 19, 2021 at 3:25 PM connorrogerson @.***> wrote:
Hi Simon,
Sorry for the delay. It's taking a while to get long jobs running on our server.
I deleted the cache and ran the script again, but I still get into the same problem. Output for this job was: Using $XDG_CACHE_HOME for cache 2021-04-17 20:21:29,319 - INFO - Starting maelstrom 2021-04-17 20:21:29,560 - INFO - motif scanning (counts) 2021-04-17 20:21:29,567 - INFO - reading table 2021-04-18 02:12:19,760 - INFO - using 14000 sequences 2021-04-18 02:12:19,842 - INFO - Creating index for genomic GC frequencies. 2021-04-18 02:13:04,005 - INFO - setting threshold 2021-04-18 02:13:17,238 - INFO - determining FPR-based threshold 2021-04-18 02:13:35,309 - INFO - creating count table slurmstepd: error: JOB 37938935 ON cpu-e-1104 CANCELLED AT 2021-04-18T08:20:25 DUE TO TIME LIMIT
I still seems to be taking a very long time to read the table.
Is it worth putting this on github in issues?
Best wishes Connor
From: Simon van Heeringen @.> Sent: 13 April 2021 07:42 To: vanheeringen-lab/gimmemotifs @.> Cc: connorrogerson @.>; Author @.> Subject: Re: [vanheeringen-lab/gimmemotifs] Maelstrom not running (#172)
Hmm this is strange. I do get an another error later in the command, I'll see what I can do about that. But it has no trouble at this step. Can you try deleting the GimmeMotifs cache directory, and then running gimme maelstrom again?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-818482871>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEA6WXCDXZ3KPXGXSNVGS4DTIPRVXANCNFSM4WNBC3PQ
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822464592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACEVJBNLELL6U4BY6F7LUDTJQVNJANCNFSM4WNBC3PQ .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/vanheeringen-lab/gimmemotifs/issues/172#issuecomment-822582087, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEA6WXEFEHWAX6I6KSP57HDTJRHITANCNFSM4WNBC3PQ.
@simonvh tried to see whether it's because of python3.7 (as per some similar errors). Installed new environment specifying python=3.6, but still having the same problem.
@simonvh this has seemed to have been fixed with upgrade to 0.16.0.
I'm glad to hear that! Still puzzled regarding the cause. I'm sorry I could not be of more help solving this earlier :(
Describe the bug Maelstrom errors when running with default parameters.
To Reproduce gimme maelstrom -N $SLURM_NTASKS /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_forkhead/maelstrom_forkhead_input.txt mm10 /rds/user/cjr78/hpc-work/ATAC/gimmemotifs/maelstrom_forkhead/
Expected behavior I've ran maelstrom before with no errors. Expecting a similar results.
Error logs 2021-01-21 15:06:51,859 - INFO - Starting maelstrom 2021-01-21 15:06:52,021 - INFO - motif scanning (counts) 2021-01-21 15:06:52,034 - INFO - reading table 2021-01-21 15:07:34,118 - INFO - using 14000 sequences 2021-01-21 15:08:35,419 - INFO - setting threshold 2021-01-21 15:09:26,638 - INFO - determining FPR-based threshold 2021-01-21 15:14:38,091 - INFO - creating count table Traceback (most recent call last): File "/home/cjr78/miniconda3/envs/gimme/bin/gimme", line 11, in
cli(sys.argv[1:])
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/cli.py", line 661, in cli
args.func(args)
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/commands/maelstrom.py", line 45, in maelstrom
aggregation=aggregation,
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/maelstrom.py", line 350, in run_maelstrom
gc=gc,
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/scanner.py", line 166, in scan_regionfile_to_table
for row in s.count(regions):
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/scanner.py", line 991, in count
for matches in self.scan(seqs, nreport, scan_rc):
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/scanner.py", line 1074, in scan
seqs = as_fasta(seqs, genome=self.genome)
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/utils.py", line 696, in as_fasta
return Fasta(fdict=as_seqdict(to_convert, genome, minsize))
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/functools.py", line 807, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/utils.py", line 618, in _as_seqdict_list
return _genomepy_convert(to_convert, genome, minsize)
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/utils.py", line 538, in _genomepy_convert
g.track2fasta(to_convert, tmpfile.name)
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/genomepy/genome.py", line 361, in track2fasta
track_type = self.get_track_type(track)
File "/home/cjr78/miniconda3/envs/gimme/lib/python3.6/site-packages/genomepy/genome.py", line 334, in get_track_type
with open(track) as fin:
TypeError: expected str, bytes or os.PathLike object, not list
Installation information (please complete the following information):
Additional context Add any other context about the problem here.