ucl-pond / pySuStaIn

Subtype and Stage Inference (SuStaIn) algorithm with an example using simulated data.
MIT License
112 stars 62 forks source link

PVD Plotting Update #30

Closed sea-shunned closed 2 years ago

sea-shunned commented 2 years ago

After a few requests for tweaking parts of plots, made a few changes to update the PVD plotting code. Changes summarized below:

Zeena-Shawa commented 2 years ago

Works perfectly when trying to plot 6 subtypes, but won't work for plotting 4 or 5 subtypes. When trying to plot 5 subtypes, the first three are correctly plotted, but then it exits with an error: IndexError: index 5 is out of bounds for axis 0 with size 5. For reference: plot_subtype_order = [1,2,0,5,4,3]; samples_sequence = 5; N_S_max = 6;

sea-shunned commented 2 years ago

Thanks @Zeena-Shawa

Seems that this occurs when you are create a model (e.g. MixtureSustain instance), but then try to plot a PVD for a different number of subtypes than N_S_max with variables (samples_sequence etc.) loaded in from pickle files.

Possible solutions are:

  1. Override the internal self._plot_subtype_order if there is a shape mismatch with samples_sequence. You can either use an arbitrary subtype order in that case, or reconstruct using the original approach (np.argsort(ml_f_EM)[::-1]), thus needing to add that as an extra argument.
  2. Turn the plotting function into a @staticmethod, so that self isn't needed at all. This would break backwards compatability, but massively increase the flexibility of the plotting function, by no longer requiring an instance (and thus all that is needed is to load pickle files, rather than recreate the instance, which is useful when the compute has been off-loaded to HPC). I prefer this, but it will require people to slightly edit old code.
  3. Just raise an error when there is a shape mismatch, and advise that the subtype_order argument is used manually. Least preferred option, as this is not the path of least resistance for the user.

Thoughts welcome!

noxtoby commented 2 years ago

I vote for number 2 — I have no problem with breaking backward compatibility, given that we have no specific releases (and aren't in a release cycle, for example).

sea-shunned commented 2 years ago

Great, that's 2 people, and by the power of democracy that'll do.

sea-shunned commented 2 years ago

Went with option no. 4: public-facing generic function (*.plot_positional_var) that can be used with pickle_files, that is also used by the original self._plot_sustain_model to maintain backward compatibility...so best of both worlds (in theory!).

Tested Mixture & Zscore versions with the simrun.py, but not Ordinal and ZScoreMissing yet.

Seymour22 commented 2 years ago

The title_font_size, stage_font_size parameters are working when passed in ._plot_sustain_model, but figsize doesnt work. None of these are recognized in combine_cross_validated_sequences though. Maybe you can help clarify a install question. Where do you run the git switch plotting command. I've done in in the sustain env, and within your downloaded folder (pySuStaIn-plotting). As I kept getting git switch is not a command* error, I just ran the pip install . from pySuStaIn-plotting

Seymour22 commented 2 years ago

Made these changes in the scripts...still doesn't recognise figsize parameter

sea-shunned commented 2 years ago

Thanks for testing @Seymour22. Replies below:

Bea-Taylor commented 2 years ago

Worked great with ordinal sustain, tried out the different arguments and all working well 💯 Thanks Cameron!

sea-shunned commented 2 years ago

Great to hear @Bea-Taylor, and thank you for testing it out!

Edit: I'll add a docstring to plot_positional_var to explain its usage, though long-term we need to modify the existing notebooks and simulation scripts to use these new functions, demonstrating how they are used (the two main scenarios interfacing with plot_positional_var and run_sustain_algorithm.

williamscotton commented 2 years ago

Hey Cameron this is really useful addition! Was wondering whether it would be useful to have ability to plot each subtype PVD as a separate figure e.g. 3 subtypes currently plotted all together on one plot - have a argument in the plotting function that allows you to output each plot as a separate figure i.e. 3 separate figures for a 3 subtype model

williamscotton commented 2 years ago

..Also do you think you might be able to add a plotting funciton that allows you to plot number but stage - potentially have an argument that allows you to select subtype 1 by stage etc then an option that plots all subtypes by stage on one plot with each subtype colour coded?

williamscotton commented 2 years ago

Hey Cameron this is really useful addition! Was wondering whether it would be useful to have ability to plot each subtype PVD as a separate figure e.g. 3 subtypes currently plotted all together on one plot - have a argument in the plotting function that allows you to output each plot as a separate figure i.e. 3 separate figures for a 3 subtype model

and also to be able to save figure for each at time of running to folder!

sea-shunned commented 2 years ago

Hey Cameron this is really useful addition! Was wondering whether it would be useful to have ability to plot each subtype PVD as a separate figure e.g. 3 subtypes currently plotted all together on one plot - have a argument in the plotting function that allows you to output each plot as a separate figure i.e. 3 separate figures for a 3 subtype model

Sure, sounds good! I'll add a separate_subtypes flag for this, in which case a list of figures and axes objects will be returned.

and also to be able to save figure for each at time of running to folder!

Great point, plot_positional_var as a standalone function should have some save functionality, which will save the whole figure or individual ones depending on the new option above. I'll add options for save location, save name, and extra kwargs for fig.savefig.

Thanks @williamscotton for taking a look and your suggestions!