Closed pna059 closed 4 weeks ago
Hi @pna059 , glad to see you are successfully clustering matrices, do you get any errors/warnings? Perhaps @jaimicore has suggestions? Bruno
Yes, most of the RSAT functionality is in place, thanks to the custom image set up by @xhejtman. I realized, that the problem might be only JSON conversion. Here is the error file content:
rsync: symlink "/home/meta/pavlan/Barley_CAGE_motifs/4DAG_clustering/js/c3-0.4.10/htdocs/css/c3.css" -> "../../c3.css" failed: No such file or directory (2)
rsync: symlink "/home/meta/pavlan/Barley_CAGE_motifs/4DAG_clustering/js/c3-0.4.10/htdocs/js/c3.js" -> "../../c3.js" failed: No such file or directory (2)
rsync: symlink "/home/meta/pavlan/Barley_CAGE_motifs/4DAG_clustering/js/c3-0.4.10/htdocs/js/c3.min.js" -> "../../c3.min.js" failed: No such file or directory (2)
rsync: symlink "/home/meta/pavlan/Barley_CAGE_motifs/4DAG_clustering/js/c3-0.4.10/htdocs/js/extensions" -> "../../extensions/js" failed: Input/output error (5)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1207) [sender=3.1.3]
Unfortunately, generator creates files that are a bit hard to display through local files because of CORS restriction. Can be avoided e.g., using --allow-file-access-from-files
option for chrome.
Btw, Radial_tree.png
file is missing and it seems it has never been there.
Hi @pna059 , in your email you say you were using the following command line to get a motif clustering similar to that in https://jaspar.genereg.net/matrix-clusters/plants https://jaspar.genereg.net/matrix-clusters/plants/ :
rsat matrix-clustering -v 1 -max_matrices 500 -matrix Merged_all_100bp_more Merged_all_100bp.tf transfac -title 'Barley_100bp_core_promoters' -hclust_method average -calc sum -metric_build_tree 'Ncor' -lth w 5 -lth cor 0.6 -lth Ncor 0.4 -radial_tree_only -label_in_tree consensus -o Merged_all_100bp_radial 2> Merged_all_pos_100bp_radial.err
With this command you do get a radial tree:
My suggestion would be to use the -trim_threshold option with a low value to trim the edges of the motifs and try to make the circle smaller.
Another option would be to play with Ncor and cor cutoff values to obtain a smaller number of non-redundant motifs.
Can you please comment on this @jaimicore @jvanheld ?
I can test that, but I have no idea what values the "-trim_threshold" should have. What to start with? Also I am a bit confused about the Ncor and cor values. To get fewer motifs, should I get them both higher? Also, if I am using the -metric_build_tree 'Ncor', does the cor value matter?
I can test that, but I have no idea what values the "-trim_threshold" should have. What to start with?
Information on sequence logos is given in bits. If you check some of your logos you'll see that the conserved bases have high values, usually with values in the interval [1-2]. On the contrary, non-informative positions have close to 0 bits, so perhaps you can try with low values such as 0.1 to see if that shrinks your motifs without loosing valuable flanking sequence.
Also I am a bit confused about the Ncor and cor values. To get fewer motifs, should I get them both higher?
I just checked $ matrix-clustering -h and it says there:
In this algorithm, the threshold can be set combining values of
different metrics.
If the descendant motifs for a particular branch do not satisfy the
threshold a new cluster is created.
For a complete description of the thresholds and the motif
comparison metrics see the help of compare-matrices
Suggested thresholds:
cor >= 0.7
Ncor >= 0.4
w >= 5
This means that a motif of width=5 with cor < 0.7 or Ncor < 0.4 to previous motifs will prompt the creation of a new cluster. So if you want less non-redundant motifs you have to make these thresholds smaller. However, the problem is that by doing so you might be merging motifs which are actually different, the same thresholds might not work for all families, so take care.
Also, if I am using the -metric_build_tree 'Ncor', does the cor value matter?
In this case I guess you will have to play only with Ncore but hopefully @jaimicore can comment
Hope this helps, Bruno
Hi ,
Sorry for my late reply, this is Jaime the maintainer of matrix-clustering.
I developed an improved version of the radial tree but it is not integrated in RSAT yet, the repository is here: https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/tree/master/Profiles_clustering
It is semi-automatic, the ring radius is manually adjusted until find the desirable size, but I can tell you what parameters to change. Here you can see many examples : https://jaspar.genereg.net/matrix-clusters/ all of them made with the repository that i shared above. All of them are part of the JASPAR 2022 release.
- Also I am a bit confused about the Ncor and cor values. To get fewer motifs, should I get them both higher? Also, if I am using the -metric_build_tree 'Ncor', does the cor value matter?
The higher the cor/Ncor threshold, the more similar the motifs are in the cluster. So if you want less clusters, we suggest smaller values. The default values were selected after an evaluation.
Regarding the trimming, i have made some test and we recommend to trimm the motifs before they are clustered, or if you prefer I could add the option of trimming when you launch the code.
When the radial tree option is indicated by the user (-radial_tree_only) all the motifs are forced to be in a single alignment (for visualization purposes) so I would expect to see a final long alignment because some motifs will be aligned in their flanks, so the final alignment would be much longer than the original motif width. The thresholds used when -radial_tree_only option is indicated are used to partition the tree and colour the branches, so the tree visually represent the clusters, although this is ignored to compute the alignment.
Let me know if this helps and we can chat in case you need to set the tree radius parameters. I will try to make this automatic and add it to RSAT.
Hi Jamie, I really appreciate your response. Our goal is making JASPAR 2022-like radial trees, indeed, for our plant species` promotoreome/regulatory sequences. Looking at the https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/tree/master/Profiles_clustering repository, my first question is whether I must use JASPAR to be able to achieve the same interactive radial tree. Currently I am using footprintDB plants. I guess, that I need to have my input to matrix clustering in the format of https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/blob/master/Profiles_clustering/data/motifs/JASPAR_2022_CORE_nematodes_non-redundant_pfms_transfac.txt My currrent outputs from the peak-motifs (from ...results/discovered_motifs/*dicovered.tf) files do not contain the transcription factor IDs, though:
#command to find motifs:
rsat peak-motifs -v 1 -title ${basename}_pos -i $f -disco positions -top_peaks 4000 -nmotifs 5 -minol 6 -maxol 7 -1str -origin center -motif_db footprintDB-plants transfac ../footprintDB.plants.motif.tf -scan_markov 1 -r_plot -img_format png -prefix peak-motifs -noov -maxpat 300 -outdir ../RSAT_pos_200bp/More_motifs/${basename}_pos
#The output .tf file looks like this:
XX
BA 1 sequences
XX
BS CCCCACTCCTCCCTCTCCCCTCCG; site_0; 1; 24; 0; p
CC program: feature
CC matrix.nb: 1
CC matrix.nb: 1
CC sites: 1
CC consensus.strict: ccccactcctccctctcccctccg
CC consensus.strict.rc: CGGAGGGGAGAGGGAGGAGTGGGG
CC consensus.IUPAC: ccccactcctccctctcccctccg
CC consensus.IUPAC.rc: CGGAGGGGAGAGGGAGGAGTGGGG
CC consensus.regexp: ccccactcctccctctcccctccg
CC consensus.regexp.rc: CGGAGGGGAGAGGGAGGAGTGGGG
XX
//
AC positions_6-7nt_m2
XX
ID positions_6-7nt_m2positions_6-7nt_m2
XX
DE ssCsCCGCCgCCGCCGsCGAgGmb
.....I deliberately copied one motif which is based on one sequence only......The database search results with TF IDs are rather present separately, in the results/discovered_vs_db directory.
Do I have to annotate each .tf file manually to generate a suitable input for matrix-clustering or is there a trick to get the ID in the .tf file directly?
Also, how can I avoid these "singleton" motifs which are subsequently vizualized as empty logos, in my case?
For annotating the radial tree using the annotate_matrix-clustering.R, I also need the --annotation .tab file.
Should that be generated manually in the format of: https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/blob/master/Profiles_clustering/data/annotation_tables/JASPAR_2022_nematodes_annotations.tsv ?
Regarding the trimming: ".... to trim the motifs before they are clustered" means adding the "-trim_threshold 0.1" , as suggested by @brunocontrerasmoreira or shoudl I rather use https://github.com/jaimicore/matrix-clustering_stand-alone/blob/main/convert-matrix.R?
I see the ring radius values may be estimated based on the number of motifs https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/blob/master/Profiles_clustering/Jaspar_2022_radial_trees_manual_parameters.txt, but it woudl be helpful if you navigate me to where to change this.
Pavla
Hi Jamie, could you, please answer my questions/let me know when you make the changes you`ve mentioned so that I can move forward? Thank you, Pavla
Hi
I really appreciate your response. Our goal is making JASPAR 2022-like radial trees, indeed, for our plant species` promotoreome/regulatory sequences. Looking at the https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/tree/master/Profiles_clustering repository, my first question is whether I must use JASPAR to be able to achieve the same interactive radial tree.
You could use any motif to generate a radial tree, in this repository we adapted the output to JASPAR but you could use it with any TF binding motif.
Currently I am using footprintDB plants. I guess, that I need to have my input to matrix clustering in the format of https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/blob/master/Profiles_clustering/data/motifs/JASPAR_2022_CORE_nematodes_non-redundant_pfms_transfac.txt My currrent outputs from the peak-motifs (from ...results/discovered_motifs/*dicovered.tf) files do not contain the transcription factor IDs, though:
This is not a limitation. There are many supported input formats (Homer, meme, transfac, etc), peak-motifs return motifs in transfac format, in RSAT we mostly use this format because it allows to add annotations.
If your motifs have the following fields is OK.
AC positions_6-7nt_m2 XX ID positions_6-7nt_m2positions_6-7nt_m2
.....I deliberately copied one motif which is based on one sequence only......The database search results with TF IDs are rather present separately, in the results/discovered_vs_db directory.
Do I have to annotate each .tf file manually to generate a suitable input for matrix-clustering or is there a trick to get the ID in the .tf file directly?
There is a short summary table from peak-motifs with the motif ID and the most similar motifs, I think if you create your own script to rename the motifs, I mean, renaming the ID or AC fields in the motifs, should be ok.
Also, how can I avoid these "singleton" motifs which are subsequently vizualized as empty logos, in my case?
There is no a function to avoid this motifs, what I would do is to generate the tree and then remove the motifs from the input file and re-run it.
For annotating the radial tree using the annotate_matrix-clustering.R, I also need the --annotation .tab file.
Should that be generated manually in the format of: https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/blob/master/Profiles_clustering/data/annotation_tables/JASPAR_2022_nematodes_annotations.tsv ?
Yes, this is the format needed.
Regarding the trimming: ".... to trim the motifs before they are clustered" means adding the "-trim_threshold 0.1" , as suggested by @brunocontrerasmoreira or shoudl I rather use https://github.com/jaimicore/matrix-clustering_stand-alone/blob/main/convert-matrix.R?
Yes. Or alternatively you could use this script I wrote: https://github.com/jaimicore/R_utilities/blob/master/R-scripts/Motif_Friseur.R Very easy to use, here is an example: https://github.com/jaimicore/R_utilities#motif-friseur
I see the ring radius values may be estimated based on the number of motifs https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/blob/master/Profiles_clustering/Jaspar_2022_radial_trees_manual_parameters.txt, but it woudl be helpful if you navigate me to where to change this.
Yes, this is not explained anywhere. Let me create a repository with all the details to reproduce the nematodes jaspar tree. Is that ok ?
Hi, I need to generate a radial tree from the peak-motif position analysis results, ideally with consensus as a label. As the label cannot be changed in the web interface and the analyses run slowly, I opt for our RSAT installation and a command line. The matrix-clustering works well, generating a regular tree, but the radial_tree_only does not produce the tree image.
Here the output files form the log: