selkamand / sigshared

A collection of utility functions used by many R packages in the sigverse
Other
0 stars 0 forks source link

Signature analysis results object to power sigstory #49

Open selkamand opened 4 days ago

selkamand commented 4 days ago

We need a data-structure to store the results of a comprehensive signature analysis, that we can use to power sigstory reports. The object only needs to describe one sigclass - we will supply a list of them to sigstory to build each reports.

The decision we need to make is whether we're willing to assume the analysis was done with a sigstash collection (so we can auto-lookup the collection) or whether the collection itself needs to be included.

Also should we include visualisations in this object? or add that as an additional layer.

One possible structure would be:

collection_name: name of the signature collection used to produce this analysis
collection: representation of the signature collection used (sigclass format)
df_tally: catalogue of observed mutations (sigverse format)
df_exposures: matrix of
df_exposures_valid
df_umap
n_comparison_samples
df_similarity
model
total_mutations
unexplained_mutations
proportion_of_unexplained_mutations
cosine_reconstructed_vs_observed
df_bootstraps
df_bootstrap_summary
gg_reconstructed_vs_observed
gg_signature_stability
gg_dotplot
ls_similar_sample_plots
gg_umap
sigclass
fitting_method
selkamand commented 4 days ago

Upon reflection, I think we should be more explicit about the purpose of the data structures.

lets define 2 different structures to store signature analysis results.

  1. signature_analysis_results The numeric (no-visualisation) results of signature analysis. Design should allow a creation of visualisation from EXCLUSIVELY data in the object.

  2. sigstory_visualisations Contains mostly visualisations only + the 3 metrics we display in sigstash. This helps us keep sigstory very light (logic is hard to debug in knitted quarto templates). Additionally, should allow us to write a method that writes all visualisations to disk so they can be pulled into non-sigstory reporting tools (i.e. imagine a write_visualisations method that to create a 'visualisations' subfolder with all the important figures - akin to linx_visualise )