ratan-lab / sumo

Subtyping tool for multi-omic data
https://pypi.org/project/python-sumo
MIT License
13 stars 1 forks source link

Question about logfiles/documentation #27

Open jonas-hag opened 2 years ago

jonas-hag commented 2 years ago

Thank you very much for this helpful tool! I have a few questions/suggestions regarding the logfile:

What do you think? I'm happy to help out with a PR if I know what the information mean :)

aakrosh commented 2 years ago

Thanks for your interest in SUMO.

The objective function used by SUMO for factorization is:

image001

SUMO aims to decompose the various adjacency matrices into a common $H$ matrix and a datatype specific $S_i$ matrix. The objective function has two components, the first one calculates the sum of error in decomposition of the adjacency matrices, and the second one ensures sparsity of the decomposed $H$ matrix. The $H$ matrix is used for cluster assignment, so we want it to be sparse. The two values in the square brackets are the two components of the objective function. We can certainly add more information about this in the log file.

As for the log file for run, you can use the -logfile option to specify that the logs be written to that file. Are you saying that the option does not work as specified?

jonas-hag commented 2 years ago

Sorry for coming back to you so late. Thank you very much for the explanation, I think it would be great if this is added to the log file!

For the -logfile option: I don't mean that it doesn't work, but maybe that it could be worthwhile to change the default from (only) printing to stdout to generating a log file by default. So far, by default in every folder for a number of clusters a .log file is generated for every eta value. I think it would make sense to also include the information printed to stdout separated by number of clusters into a log file for every number of clusters and include these log files into the folders.