rsa-tools / rsat-code

This repo contains the code required to run a local version of the software suite Regulatory Sequence Analysis Tools (RSAT).
http://rsat.eu
GNU Affero General Public License v3.0
5 stars 6 forks source link

Radial tree image not generated #34

Closed pna059 closed 4 weeks ago

pna059 commented 1 year ago

Hi, I need to generate a radial tree from the peak-motif position analysis results, ideally with consensus as a label. As the label cannot be changed in the web interface and the analyses run slowly, I opt for our RSAT installation and a command line. The matrix-clustering works well, generating a regular tree, but the radial_tree_only does not produce the tree image.

$ rsat matrix-clustering -v 1 -max_matrices 300 -matrix 8dap_all_pos 8DAP_all_positions.tf transfac -hclust_method average -calc sum -title '8dap_all_pos' -metric_build_tree 'Ncor' -lth w 5 -lth cor 0.6 -lth Ncor 0.4 -quick -radial_tree_only -label_in_tree consensus -return json,heatmap -o 8DAP_all_pos_radial 2> 8DAP_all_pos_radial.err
cannot delete non-empty directory: js/c3-0.4.10/htdocs/js/extensions

Here the output files form the log:

; Output files
;   logo_cladogram_html             8DAP_all_pos_radial_logo_tree.html
;   heatmap_coverage_attributes     8DAP_all_pos_radial_tables/convergency_heatmap_attributes.tab
;   root_motifs_table               8DAP_all_pos_radial_root_motifs_table.tab
;   small_logo_cladogram_html       8DAP_all_pos_radial_portable_logo_tree.html
;   radial_tree                     8DAP_all_pos_radial_radial_tree.html
;   coverage_json_folder            8DAP_all_pos_radial_coverage_json
;   alignment_table                 8DAP_all_pos_radial_tables/alignment_table.tab
;   ref_collection_tf               8DAP_all_pos_radial_reference_collection.tf
;   pairwise_compa                  8DAP_all_pos_radial_tables/pairwise_compa.tab
;   consensus_cladogram_json        8DAP_all_pos_radial_trees/tree.json
;   prefix                          8DAP_all_pos_radial
;   dynamic_heatmap_html            8DAP_all_pos_radial_dynamic_heatmap.html
;   html_index                      8DAP_all_pos_radial_html/index.html
;   clusters_id_temp                8DAP_all_pos_radial_tables/cluster_1_motif_IDs.tab
;   heatmap_collections_attributes  8DAP_all_pos_radial_tables/collection_heatmap_order.tab
;   dynamic_coverage_heatmap_html   8DAP_all_pos_radial_dynamic_coverage_heatmap.html
;   summary                         8DAP_all_pos_radial_SUMMARY.html
;   hexa_colors                     8DAP_all_pos_radial_hexa_colors.txt
;   distance_table                  8DAP_all_pos_radial_tables/distance_table.tab
;   small_forest_export             8DAP_all_pos_radial_small_forest.html
;   motif_richness_barplot_html     8DAP_all_pos_radial_motif_richness_barplot.html
;   input_matrices                  8DAP_all_pos_radial_data/input_motifs_processed.tf
;   central_motifs_tf               8DAP_all_pos_radial_central_motifs_transfac.tf
;   pairwise_compa_html             8DAP_all_pos_radial_html/pairwise_compa.html
;   alignment_table_html            8DAP_all_pos_radial_tables/alignment_table.html
;   summary_temp                    8DAP_all_pos_radial_SUMMARY_TEMP.html
;   heatmap_coverage_d3             8DAP_all_pos_radial_tables/convergency_heatmap_tab.tsv
;   motif_file                      8DAP_all_pos_radial_data/input_motifs_processed.tf
;   cluster_summary_table           8DAP_all_pos_radial_tables/clusters_summary_table.tab
;   archive                         8DAP_all_pos_radial_archive.zip
;   matrix_descriptions             8DAP_all_pos_radial_tables/pairwise_compa_matrix_descriptions.tab
;   internal_nodes_attributes_table 8DAP_all_pos_radial_tables/internal_nodes_attributes.tab
;   temp_html_2                     8DAP_all_pos_radial_temporary_2.html
;   central_motifs                  8DAP_all_pos_radial_central_motifs_IDs.tab
;   percent_table                   8DAP_all_pos_radial_tables/clusters_summary_percent_table.tab
;   temp_html                       8DAP_all_pos_radial_temporary.html
;   clusters_table_html             8DAP_all_pos_radial_tables/clusters.html
;   clusters_table                  8DAP_all_pos_radial_tables/clusters.tab
;   matrix_descriptions_html        8DAP_all_pos_radial_html/pairwise_compa_matrix_descriptions.html
;   Rlog                            8DAP_all_pos_radial_Rlog.txt
;   clusters_transfac               8DAP_all_pos_radial_data/cluster_1_transfac_motifs.tf
;   heatmap_pdf                     8DAP_all_pos_radial_figures/heatmap.pdf
;   distance_table_html             8DAP_all_pos_radial_tables/distance_table.html
;   int_align                       8DAP_all_pos_radial_tables/intermediate_alignments.tab
;   coverage_table                  8DAP_all_pos_radial_tables/clusters_summary_coverage_contingency_table.tab
;   clusters_table_motif_names      8DAP_all_pos_radial_tables/clusters_motif_names.tab
;   central_motifs_IDs_temp         8DAP_all_pos_radial_central_motifs_IDs_temporal.tab
;   heatmap_collections_d3          8DAP_all_pos_radial_tables/collection_heatmap_tab.tsv
;   all_concatenated_motifs         8DAP_all_pos_radial_aligned_logos/All_concatenated_motifs.tf
;   clusters_table_motif_names_html 8DAP_all_pos_radial_tables/clusters_motif_names.html
;   log                             8DAP_all_pos_radial_log.txt
;   err_log                         8DAP_all_pos_radial_errors.txt
;   root_motifs                     8DAP_all_pos_radial_cluster_root_motifs.tf
;   cluster_IDs_names               8DAP_all_pos_radial_cluster_IDs.txt
;   heatmap_jpg                     8DAP_all_pos_radial_figures/heatmap.jpg
;   temp_html_3                     8DAP_all_pos_radial_temporary_3.html
;   reference_compa                 8DAP_all_pos_radial_tables/Root_vs_reference_collection_compa.tab
;   nb_clusters_table               8DAP_all_pos_radial_tables/number_of_clusters.tab
;   motif_richness_tsv              8DAP_all_pos_radial_motif_richness.tsv
;   radial_tree_template            /opt/conda/envs/rsat/share/rsat/public_html/templates_html/Radial_tree_template.html
;   reference_compa_html            8DAP_all_pos_radial_html/Root_vs_reference_collection_compa.html
; Directories
;   data                            8DAP_all_pos_radial_data
;   output                          .
; Host name jupyter-pavlan--rsat-5fcustom
; Job started   2022-11-02.115456
; Job done  2022-11-02.115745
; Seconds   0.76
;   user    0.76
;   system  0.5
;   cuser   87.55
;   csystem 13.72
eead-csic-compbio commented 1 year ago

Hi @pna059 , glad to see you are successfully clustering matrices, do you get any errors/warnings? Perhaps @jaimicore has suggestions? Bruno

pna059 commented 1 year ago

Yes, most of the RSAT functionality is in place, thanks to the custom image set up by @xhejtman. I realized, that the problem might be only JSON conversion. Here is the error file content:

rsync: symlink "/home/meta/pavlan/Barley_CAGE_motifs/4DAG_clustering/js/c3-0.4.10/htdocs/css/c3.css" -> "../../c3.css" failed: No such file or directory (2)
rsync: symlink "/home/meta/pavlan/Barley_CAGE_motifs/4DAG_clustering/js/c3-0.4.10/htdocs/js/c3.js" -> "../../c3.js" failed: No such file or directory (2)
rsync: symlink "/home/meta/pavlan/Barley_CAGE_motifs/4DAG_clustering/js/c3-0.4.10/htdocs/js/c3.min.js" -> "../../c3.min.js" failed: No such file or directory (2)
rsync: symlink "/home/meta/pavlan/Barley_CAGE_motifs/4DAG_clustering/js/c3-0.4.10/htdocs/js/extensions" -> "../../extensions/js" failed: Input/output error (5)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1207) [sender=3.1.3]
xhejtman commented 1 year ago

Unfortunately, generator creates files that are a bit hard to display through local files because of CORS restriction. Can be avoided e.g., using --allow-file-access-from-files option for chrome.

Btw, Radial_tree.png file is missing and it seems it has never been there.

eead-csic-compbio commented 1 year ago

Hi @pna059 , in your email you say you were using the following command line to get a motif clustering similar to that in https://jaspar.genereg.net/matrix-clusters/plants https://jaspar.genereg.net/matrix-clusters/plants/ :

rsat matrix-clustering -v 1 -max_matrices 500 -matrix Merged_all_100bp_more Merged_all_100bp.tf transfac -title 'Barley_100bp_core_promoters' -hclust_method average -calc sum -metric_build_tree 'Ncor' -lth w 5 -lth cor 0.6 -lth Ncor 0.4 -radial_tree_only -label_in_tree consensus -o Merged_all_100bp_radial 2> Merged_all_pos_100bp_radial.err

With this command you do get a radial tree:

pastedImage

My suggestion would be to use the -trim_threshold option with a low value to trim the edges of the motifs and try to make the circle smaller.

Another option would be to play with Ncor and cor cutoff values to obtain a smaller number of non-redundant motifs.

Can you please comment on this @jaimicore @jvanheld ?

pna059 commented 1 year ago

I can test that, but I have no idea what values the "-trim_threshold" should have. What to start with? Also I am a bit confused about the Ncor and cor values. To get fewer motifs, should I get them both higher? Also, if I am using the -metric_build_tree 'Ncor', does the cor value matter?

eead-csic-compbio commented 1 year ago

I can test that, but I have no idea what values the "-trim_threshold" should have. What to start with?

Information on sequence logos is given in bits. If you check some of your logos you'll see that the conserved bases have high values, usually with values in the interval [1-2]. On the contrary, non-informative positions have close to 0 bits, so perhaps you can try with low values such as 0.1 to see if that shrinks your motifs without loosing valuable flanking sequence.

Also I am a bit confused about the Ncor and cor values. To get fewer motifs, should I get them both higher?

I just checked $ matrix-clustering -h and it says there:

    In this algorithm, the threshold can be set combining values of
    different metrics.

    If the descendant motifs for a particular branch do not satisfy the
    threshold a new cluster is created.

    For a complete description of the thresholds and the motif
    comparison metrics see the help of compare-matrices

    Suggested thresholds:

        cor >= 0.7

        Ncor >= 0.4

        w >= 5

This means that a motif of width=5 with cor < 0.7 or Ncor < 0.4 to previous motifs will prompt the creation of a new cluster. So if you want less non-redundant motifs you have to make these thresholds smaller. However, the problem is that by doing so you might be merging motifs which are actually different, the same thresholds might not work for all families, so take care.

Also, if I am using the -metric_build_tree 'Ncor', does the cor value matter?

In this case I guess you will have to play only with Ncore but hopefully @jaimicore can comment

Hope this helps, Bruno

jaimicore commented 1 year ago

Hi ,

Sorry for my late reply, this is Jaime the maintainer of matrix-clustering.

I developed an improved version of the radial tree but it is not integrated in RSAT yet, the repository is here: https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/tree/master/Profiles_clustering

It is semi-automatic, the ring radius is manually adjusted until find the desirable size, but I can tell you what parameters to change. Here you can see many examples : https://jaspar.genereg.net/matrix-clusters/ all of them made with the repository that i shared above. All of them are part of the JASPAR 2022 release.

  • Also I am a bit confused about the Ncor and cor values. To get fewer motifs, should I get them both higher? Also, if I am using the -metric_build_tree 'Ncor', does the cor value matter?

The higher the cor/Ncor threshold, the more similar the motifs are in the cluster. So if you want less clusters, we suggest smaller values. The default values were selected after an evaluation.

Regarding the trimming, i have made some test and we recommend to trimm the motifs before they are clustered, or if you prefer I could add the option of trimming when you launch the code.

When the radial tree option is indicated by the user (-radial_tree_only) all the motifs are forced to be in a single alignment (for visualization purposes) so I would expect to see a final long alignment because some motifs will be aligned in their flanks, so the final alignment would be much longer than the original motif width. The thresholds used when -radial_tree_only option is indicated are used to partition the tree and colour the branches, so the tree visually represent the clusters, although this is ignored to compute the alignment.

Let me know if this helps and we can chat in case you need to set the tree radius parameters. I will try to make this automatic and add it to RSAT.

pna059 commented 1 year ago

Hi Jamie, I really appreciate your response. Our goal is making JASPAR 2022-like radial trees, indeed, for our plant species` promotoreome/regulatory sequences. Looking at the https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/tree/master/Profiles_clustering repository, my first question is whether I must use JASPAR to be able to achieve the same interactive radial tree. Currently I am using footprintDB plants. I guess, that I need to have my input to matrix clustering in the format of https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/blob/master/Profiles_clustering/data/motifs/JASPAR_2022_CORE_nematodes_non-redundant_pfms_transfac.txt My currrent outputs from the peak-motifs (from ...results/discovered_motifs/*dicovered.tf) files do not contain the transcription factor IDs, though:

#command to find motifs:
rsat peak-motifs -v 1 -title ${basename}_pos -i $f -disco positions -top_peaks 4000 -nmotifs 5 -minol 6 -maxol 7 -1str -origin center -motif_db footprintDB-plants transfac ../footprintDB.plants.motif.tf -scan_markov 1 -r_plot -img_format png -prefix peak-motifs -noov -maxpat 300 -outdir ../RSAT_pos_200bp/More_motifs/${basename}_pos

#The output .tf file looks like this:
XX
BA  1 sequences
XX
BS  CCCCACTCCTCCCTCTCCCCTCCG; site_0; 1; 24; 0; p
CC  program: feature
CC  matrix.nb: 1
CC  matrix.nb: 1
CC  sites: 1
CC  consensus.strict: ccccactcctccctctcccctccg
CC  consensus.strict.rc: CGGAGGGGAGAGGGAGGAGTGGGG
CC  consensus.IUPAC: ccccactcctccctctcccctccg
CC  consensus.IUPAC.rc: CGGAGGGGAGAGGGAGGAGTGGGG
CC  consensus.regexp: ccccactcctccctctcccctccg
CC  consensus.regexp.rc: CGGAGGGGAGAGGGAGGAGTGGGG
XX
//
AC  positions_6-7nt_m2
XX
ID  positions_6-7nt_m2positions_6-7nt_m2
XX
DE  ssCsCCGCCgCCGCCGsCGAgGmb

.....I deliberately copied one motif which is based on one sequence only......The database search results with TF IDs are rather present separately, in the results/discovered_vs_db directory.

For annotating the radial tree using the annotate_matrix-clustering.R, I also need the --annotation .tab file.

I see the ring radius values may be estimated based on the number of motifs https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/blob/master/Profiles_clustering/Jaspar_2022_radial_trees_manual_parameters.txt, but it woudl be helpful if you navigate me to where to change this.

Pavla

pna059 commented 1 year ago

Hi Jamie, could you, please answer my questions/let me know when you make the changes you`ve mentioned so that I can move forward? Thank you, Pavla

jaimicore commented 1 year ago

Hi

I really appreciate your response. Our goal is making JASPAR 2022-like radial trees, indeed, for our plant species` promotoreome/regulatory sequences. Looking at the https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/tree/master/Profiles_clustering repository, my first question is whether I must use JASPAR to be able to achieve the same interactive radial tree.

You could use any motif to generate a radial tree, in this repository we adapted the output to JASPAR but you could use it with any TF binding motif.

Currently I am using footprintDB plants. I guess, that I need to have my input to matrix clustering in the format of https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/blob/master/Profiles_clustering/data/motifs/JASPAR_2022_CORE_nematodes_non-redundant_pfms_transfac.txt My currrent outputs from the peak-motifs (from ...results/discovered_motifs/*dicovered.tf) files do not contain the transcription factor IDs, though:

This is not a limitation. There are many supported input formats (Homer, meme, transfac, etc), peak-motifs return motifs in transfac format, in RSAT we mostly use this format because it allows to add annotations.

If your motifs have the following fields is OK.

AC positions_6-7nt_m2 XX ID positions_6-7nt_m2positions_6-7nt_m2

.....I deliberately copied one motif which is based on one sequence only......The database search results with TF IDs are rather present separately, in the results/discovered_vs_db directory.

Do I have to annotate each .tf file manually to generate a suitable input for matrix-clustering or is there a trick to get the ID in the .tf file directly?

There is a short summary table from peak-motifs with the motif ID and the most similar motifs, I think if you create your own script to rename the motifs, I mean, renaming the ID or AC fields in the motifs, should be ok.

Also, how can I avoid these "singleton" motifs which are subsequently vizualized as empty logos, in my case?

There is no a function to avoid this motifs, what I would do is to generate the tree and then remove the motifs from the input file and re-run it.

For annotating the radial tree using the annotate_matrix-clustering.R, I also need the --annotation .tab file.

Should that be generated manually in the format of: https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/blob/master/Profiles_clustering/data/annotation_tables/JASPAR_2022_nematodes_annotations.tsv ?

Yes, this is the format needed.

Regarding the trimming: ".... to trim the motifs before they are clustered" means adding the "-trim_threshold 0.1" , as suggested by @brunocontrerasmoreira or shoudl I rather use https://github.com/jaimicore/matrix-clustering_stand-alone/blob/main/convert-matrix.R?

Yes. Or alternatively you could use this script I wrote: https://github.com/jaimicore/R_utilities/blob/master/R-scripts/Motif_Friseur.R Very easy to use, here is an example: https://github.com/jaimicore/R_utilities#motif-friseur

I see the ring radius values may be estimated based on the number of motifs https://github.com/jaimicore/JASPAR_2022_motif_discovery_and_curation_pipeline/blob/master/Profiles_clustering/Jaspar_2022_radial_trees_manual_parameters.txt, but it woudl be helpful if you navigate me to where to change this.

Yes, this is not explained anywhere. Let me create a repository with all the details to reproduce the nematodes jaspar tree. Is that ok ?