raphaelvallat / yasa

YASA (Yet Another Spindle Algorithm): a Python package to analyze polysomnographic sleep recordings.
https://raphaelvallat.com/yasa/
BSD 3-Clause "New" or "Revised" License
417 stars 113 forks source link

Create Hypnogram class #105

Closed raphaelvallat closed 1 year ago

raphaelvallat commented 1 year ago

The way that YASA currently handles hypnogram could be improved. Specifically, I think that we should create a class Hypnogram that would serve as the new default to store and manipulate hypnogram in YASA.

This class would have several methods (from existing functions) such as:

hyp = Hypnogram(values=np.array([...]), resolution="30s")
hyp.get_sleep_statistics()
hyp.get_transition_matrix()
hyp.upsample_to_data()
hyp.find_periods()

hyp.as_str()
hyp.as_int()

# and so much more...!

Any other suggestions welcome!

remrama commented 1 year ago

Great idea!! I've also been thinking hypnograms could be handled better, but hadn't thought of making a hypnogram class. I like it.

The class would also solve these issues I'd come across:

The class idea is a better solution to all these problems. It really puts hypnograms at the center of the package, which I think is appropriate since they are really what makes sleep analyses unique.

After I finish working on the evaluation classes in #78 this is going to be all the more important, because it introduces the concept of hypnograms with different numbers of possible stages. Right now I've added n_stages as a keyword arg to a few functions, which I fear might confuse some users unfamiliar with wearables/actigraphy. This hypnogram class would be a great solution to that. Also the EpochByEpoch evaluation class I'm working on for the same PR could eventually be switch to something like hypno.evaluate_against(hypno). Way simpler.

raphaelvallat commented 1 year ago

Agree with all the above! These are great points. We do need to come up with a strict input validation that can handle 2, 3, 4 or 5-stages. For example, if you create Hypnogram(values, sf, n_classes=3), then the only accepted values are "WAKE", "NREM", and "REM". With n_classes=5 (default), accepted values are "WAKE", "W", "N1", "N2", "N3", "REM", "R", or 0, 1, 2, 3, 4.

remrama commented 1 year ago

Yes, initializing the hypnogram class with strings would make checking that way easier because the different staging schemes usually have different names (eg, 2-stage: "Sleep" and "Wake", 4-stage: "Wake", "Light", "Deep", "REM"). I've been struggling to validate n_stages when ints are used because of course someone could request 4 stages but only use a subset of them and it would "look like" 3-stage incorrectly.

So +1 for using strings to start, and then the class could convert to ints with confidence for all the underlying functions 👍

raphaelvallat commented 1 year ago

Awesome! @remrama do you think we should implement this PR before your PR on performance evaluation? Seems like it would make your life way easier.

remrama commented 1 year ago

hmm... Well I have it all working at this point (w/ the n_stages argument scattered in a few places). So waiting to implement it wouldn't solve any work burden, but you might want to wait just to avoid having n_stages in a few new places only to be removed shortly after.

Plus I was nervous that n_stages was starting to spread too far into the rest of the codebase anyways. That was almost the only reason I needed to modify a few existing functions (eg, sleep_statistics). It's minimal, but still maybe less than ideal.

So again it probably comes down to whether you are okay with the n_stages implementation having a temporary presence. If so, the evaluation stuff could merge soon and then just get modified whenever the Hypnogram class happens. Actually I was thinking about submitting that PR some time this coming week, so maybe the simplest course of action would be for me to go ahead and submit that PR, then you can see it in more detail and decide whether you think we should wait for the Hypnogram class to merge it or not. If you want to wait, we can leave it sitting there until the Hypnogram class is ready.

raphaelvallat commented 1 year ago

Looking forward to the PR! Feel free to submit it for now. I'll try to work on the hypnogram PR in the next few weeks. I'd love to release a new major version of YASA (0.7.0) around the new year, with:

raphaelvallat commented 1 year ago

FYI I started a new branch for this class here: https://github.com/raphaelvallat/yasa/tree/hypnogram_class

I added a few lines for class creation:

class Hypnogram:
    """Main class for manipulation of hypnogram in YASA."""

    def __init__(self, values, n_stages=5, *, freq="30s", start=None):
        assert isinstance(values, (list, np.ndarray, pd.Series))
        assert isinstance(n_stages, int)
        assert n_stages in [2, 3, 4, 5]
        assert isinstance(freq, str)
        assert isinstance(start, (type(None), str, pd.Timestamp))
        if n_stages == 2:
            accepted = ["S", "W", "SLEEP", "WAKE", "ART", "UNS"]
        elif n_stages == 3:
            accepted = ["WAKE", "W", "NREM", "REM", "R", "ART", "UNS"]
        elif n_stages == 4:
            accepted = ["WAKE", "W", "LIGHT", "DEEP", "REM", "R", "ART", "UNS"]
        else:
            accepted = ["WAKE", "W", "N1", "N2", "N3", "REM", "R", "ART", "UNS"]
        assert all([val.upper() in accepted for val in values]), (
            f"{np.unique(values)} do not match the accepted values for a {n_stages} stages "
            f"hypnogram: {accepted}"
        )
        hypno = pd.Series(values, name="Stage").str.upper()
        hypno = hypno.replace({"S": "SLEEP", "W": "WAKE", "R": "REM"})
        if start is not None:
            hypno.index = pd.date_range(start=start, freq=freq, periods=hypno.size)
        hypno.index.name = "Epoch"
        self._hypno = hypno
        self._freq = freq
        self._start = start
        self._n_stages = n_stages

    def __repr__(self):
        return f"{self._hypno}"

    def __str__(self):
        return f"{self._hypno}"

Example

image

remrama commented 1 year ago

Beautiful.

I think I'll wait for my evaluation PR until this is available to work with. I had to nuke my previous fork (and in-progress branch) because I'm stupid, and so at this point it'll just be easier to reincorporate my previous code and accommodate this structure simultaneously. Otherwise it will be messy adding an n_stages argument to various places and then reverting it back.

raphaelvallat commented 1 year ago

Sounds good, I'll try to have a working version with basic methods (plot_hypnogram, sleep_statistics, etc) into master early next week. We can still add new methods later on.

raphaelvallat commented 1 year ago

FYI made some good progress today on the Hypnogram class. This is going to be a game-changer for YASA!

https://github.com/raphaelvallat/yasa/blob/f229fab7de40a62c17f0991ffe8c2c25c3c6a930/yasa/hypno.py#L14-L351

remrama commented 1 year ago

Awesome!

btw I had an idea for a method here, something you might already have plans for. Below is a general idea of what I'm thinking, main purpose is to get a dataframe for exporting that has all hypno-info on it. I like to export these at the end of an auto-staging script so I can load it in elsewhere for plotting. So I might overdo it a bit, but I like to have all info possible here. Also, including things like onset and duration make it a BIDS-compatible events file.

Sorry I'm being lazy with variables here (just copy/pasting from my personal scripts), but I think you'll get the idea.

class Hypnogram:
    def to_dataframe(self):
        """Something that compiles all epoch-level info into a single dataframe for exporting."""
        epochs = np.arange(hypno.size)
        df = pd.DataFrame(
            {
                "epoch": epochs,
                "value": yasa.hypno_str_to_int(hypno),
                "stage": hypno,
                "onset": epochs * 30,
                "duration": 30,
            }
        )
        df = df.set_index("epoch").join(sls.predict_proba())
        return df
remrama commented 1 year ago

Another suggestion: add a scorer or scorer_id property. Any string would be accepted, but would commonly be initials of a human scorer or name of algorithm. For example, the returned hypno from yasa.SleepStaging could have a scorer_id of "YASA" or "YASAv0.6.2" if you wanted to be more specific.

raphaelvallat commented 1 year ago

Yes and Yes! Love these ideas.

For scorer, we could even make it as the default name of the resulting pd.Series?

remrama commented 1 year ago

Yes I was thinking that too, using the pd.Series name. I saw right now you name the pd.Series Stage, which also makes sense. Either works.

Maybe too avoid confusion you might not accept a pd.Series as input, which would make it clearer that YASA is going to control that. Not a big difference for users, since they could just use Series.values if they already had one, plus I imagine that 99% of the time users will initiate a new yasa.Hypnogram instance using a list or array anyways. Or maybe you foresee a different use I'm not thinking of.

raphaelvallat commented 1 year ago

I also wasn't sure about accepting pd.Series but eventually decided that it might make things easier for beginner users who just loaded their hypnogram from a CSV file in Pandas. YASA is converting to an array internally to make sure we don't mess things up with the previous index.