Open Rmulet opened 6 years ago
It's a good point. Similarly, I want to add some other arguments to the command-line too, such as the number of threads to use. I'll keep it on the to-do. (which I see you also mentioned...)
Hi,
The JASPAR motif is in a different format than gimmme accepts. How can I convert it? http://jaspar.genereg.net/download/CORE/JASPAR2018_CORE_vertebrates_non-redundant_pfms_jaspar.txt
Thanks! Tommy
@crazyhottommy You can use functions in the gimmemotifs
module to that end. Some useful functions are described in this link: http://gimmemotifs.readthedocs.io/en/master/api.html. I used the following code snippet a few months ago based on the examples there (with the right indentation, obviously):
# Read motifs in Jaspar format with open(jaspar_file) as f: motifs = readmotifs(f, fmt="jaspar") # Print them in .pwm format used by gimmemotifs for motif in motifs: motif.id = motif.id.replace('\t','') # I had to replace tabs with underscore to avoid logo issues outfile.write(motif.to_pwm())
@Rmulet thanks for sharing the code!
need to add the new line
jaspar_file = "JASPAR2018_CORE_vertebrates_non-redundant_pfms_jaspar.txt"
outfile = open("JASPAR2018_CORE_vertebrates_non-redundant_pfms_jaspar.pwm", "w")
with open(jaspar_file) as f:
motifs = read_motifs(f, fmt="jaspar")
# Print them in .pwm format used by gimmemotifs
for motif in motifs:
# I had to replace tabs with underscore to avoid logo issues
motif.id = motif.id.replace('\t','_')
outfile.write(motif.to_pwm() + '\n')
Hi,
I got a key error:
gimme maelstrom myinput.txt hg19 maelstrom.JASPAR18.out -p /apps/conda/envs/gimme/share/gimmemotifs/motif_databases/JASPAR2018_CORE_vertebrates_non-redundant_pfms_jaspar.pwm
2018-05-07 23:22:48,410 - INFO - Starting maelstrom
2018-05-07 23:22:48,423 - INFO - Motif scanning (counts)
2018-05-07 23:22:48,423 - INFO - reading table
2018-05-07 23:22:48,512 - INFO - setting threshold
Determining threshold for fpr 0.01 and length 200 based on hg19
2018-05-07 23:39:16,407 - INFO - creating count table
Traceback (most recent call last):
File "/apps/conda/envs/gimme/bin/gimme", line 496, in <module>
args.func(args)
File "/apps/conda/envs/gimme/lib/python3.6/site-packages/gimmemotifs/commands/maelstrom.py", line 29, in maelstrom
run_maelstrom(infile, genome, outdir, pwmfile, methods=methods)
File "/apps/conda/envs/gimme/lib/python3.6/site-packages/gimmemotifs/maelstrom.py", line 193, in run_maelstrom
pwmfile=pwmfile)
File "/apps/conda/envs/gimme/lib/python3.6/site-packages/gimmemotifs/maelstrom.py", line 61, in scan_to_table
for row in s.count(regions):
File "/apps/conda/envs/gimme/lib/python3.6/site-packages/gimmemotifs/scanner.py", line 445, in count
for matches in self.scan(seqs, nreport, scan_rc):
File "/apps/conda/envs/gimme/lib/python3.6/site-packages/gimmemotifs/scanner.py", line 496, in scan
for result in it:
File "/apps/conda/envs/gimme/lib/python3.6/site-packages/gimmemotifs/scanner.py", line 578, in _scan_sequences
motifs = [(m, self.threshold[m.id]) for m in read_motifs(f)]
File "/apps/conda/envs/gimme/lib/python3.6/site-packages/gimmemotifs/scanner.py", line 578, in <listcomp>
motifs = [(m, self.threshold[m.id]) for m in read_motifs(f)]
KeyError: 'MA1140.1_JUNB(var.2)'
should I not change the tab to _
?
UPDATE. if I did not change the tab to _
, I got a different error:
gimme maelstrom my_input.txt hg19 maelstrom_JASPAR18.out -p /apps/conda/envs/gimme/share/gimmemotifs/motif_databases/JASPAR2018_CORE_vertebrates_non-redundant_pfms_jaspar.pwm
2018-05-08 08:24:34,970 - INFO - Starting maelstrom
2018-05-08 08:24:34,983 - INFO - Motif scanning (counts)
2018-05-08 08:24:34,983 - INFO - reading table
2018-05-08 08:24:35,073 - INFO - setting threshold
2018-05-08 08:24:35,748 - INFO - creating count table
2018-05-08 08:44:44,214 - INFO - done
2018-05-08 08:44:44,295 - INFO - creating dataframe
2018-05-08 08:44:52,309 - INFO - Motif scanning (scores)
2018-05-08 08:44:52,309 - INFO - reading table
2018-05-08 08:44:52,552 - INFO - creating score table
2018-05-08 09:05:55,895 - INFO - done
2018-05-08 09:05:55,987 - INFO - creating dataframe
2018-05-08 09:06:32,577 - INFO - Fitting LightningClassification
2018-05-08 09:29:11,227 - INFO - Done
2018-05-08 09:29:13,426 - INFO - Fitting MWU
/apps/conda/envs/gimme/lib/python3.6/site-packages/gimmemotifs/moap.py:497: RuntimeWarning: divide by zero encountered in log10
self.act_ = pd.DataFrame(-np.log10(fpr.T),
2018-05-08 09:29:19,802 - INFO - Done
2018-05-08 09:29:20,718 - INFO - Fitting Hypergeom
2018-05-08 09:29:25,571 - INFO - Done
2018-05-08 09:29:27,737 - INFO - Fitting RF
2018-05-08 09:30:46,125 - INFO - Done
2018-05-08 09:30:46,136 - INFO - Rank aggregation
2018-05-08 09:30:49,836 - INFO - html report
Traceback (most recent call last):
File "/apps/conda/envs/gimme/bin/gimme", line 496, in <module>
args.func(args)
File "/apps/conda/envs/gimme/lib/python3.6/site-packages/gimmemotifs/commands/maelstrom.py", line 29, in maelstrom
run_maelstrom(infile, genome, outdir, pwmfile, methods=methods)
File "/apps/conda/envs/gimme/lib/python3.6/site-packages/gimmemotifs/maelstrom.py", line 309, in run_maelstrom
pwmfile
File "/apps/conda/envs/gimme/lib/python3.6/site-packages/gimmemotifs/report.py", line 252, in maelstrom_html_report
f = df["factors"].str.len() > 30
File "/apps/conda/envs/gimme/lib/python3.6/site-packages/pandas/core/generic.py", line 3077, in __getattr__
return object.__getattribute__(self, name)
File "/apps/conda/envs/gimme/lib/python3.6/site-packages/pandas/core/base.py", line 243, in __get__
return self.construct_accessor(instance)
File "/apps/conda/envs/gimme/lib/python3.6/site-packages/pandas/core/strings.py", line 1909, in _make_str_accessor
raise AttributeError("Can only use .str accessor with string "
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
The activity* files
look fine, but the final.out.csv
file only contains the TF name but not the scores for each cluster. and no html file was generated.
Thanks @Rmulet for helping out. Sorry for not replying earlier @crazyhottommy, I was on vacation.
I think there is a bug relating to either motif names (you can try removing all special characters) or the link of motifs to TFs. I will have a look it at this once I find some time.
No problem! in the motif_database folder, there are files gimme.vertebrate.v3.1.motif2factors.txt I need to create one for JASPAR2018 as well?
Yes, that should work. In the same directory as your motif file. It's a simple tab separated file with a header:
motif factors
Motif1 Factor1,Factor2
Motif2 Factor3,Factor4,Factor5
when one uses different motifs .pwm files, how gimme maelstrom
finds the correct linking file?
I need to give the same prefix for the linking file and the pwm file?
Hi,
So far I have been using the default database that comes bundled with Gimmemotifs, which is quite alright for most purposes. However, Jaspar 2018 has just been released and I wanted to repeat some analyses to see what new motifs may pop up. For some tools this is relatively straightforward (--pwmfile in
match
ormaelstrom
), but I did not see such an option in gimme motifs. Thus, I had to modify the config file, which is not as convenient as adding an argument.I know it's a minor thing, but I leave this suggestion here in case you want to implement it if you deem it worthwhile.
PS: Now that I think of it, the number of CPUs to be used could also be a valuable addition to command line program calls.