Closed raphaelvallat closed 1 year ago
Great idea!! I've also been thinking hypnograms could be handled better, but hadn't thought of making a hypnogram class. I like it.
The class would also solve these issues I'd come across:
SleepStaging.predict()
returns strings, and most other functions require ints. Of course there are the conversion functions, but it seems like tracking that needn't be on the user. I was thinking each function could have an automatic conversion section if necessary.check_hypno()
utility function.The class idea is a better solution to all these problems. It really puts hypnograms at the center of the package, which I think is appropriate since they are really what makes sleep analyses unique.
After I finish working on the evaluation classes in #78 this is going to be all the more important, because it introduces the concept of hypnograms with different numbers of possible stages. Right now I've added n_stages
as a keyword arg to a few functions, which I fear might confuse some users unfamiliar with wearables/actigraphy. This hypnogram class would be a great solution to that. Also the EpochByEpoch evaluation class I'm working on for the same PR could eventually be switch to something like hypno.evaluate_against(hypno)
. Way simpler.
Agree with all the above! These are great points. We do need to come up with a strict input validation that can handle 2, 3, 4 or 5-stages. For example, if you create Hypnogram(values, sf, n_classes=3)
, then the only accepted values are "WAKE", "NREM", and "REM". With n_classes=5
(default), accepted values are "WAKE", "W", "N1", "N2", "N3", "REM", "R", or 0, 1, 2, 3, 4.
Yes, initializing the hypnogram class with strings would make checking that way easier because the different staging schemes usually have different names (eg, 2-stage: "Sleep" and "Wake", 4-stage: "Wake", "Light", "Deep", "REM"). I've been struggling to validate n_stages
when ints are used because of course someone could request 4 stages but only use a subset of them and it would "look like" 3-stage incorrectly.
So +1 for using strings to start, and then the class could convert to ints with confidence for all the underlying functions 👍
Awesome! @remrama do you think we should implement this PR before your PR on performance evaluation? Seems like it would make your life way easier.
hmm... Well I have it all working at this point (w/ the n_stages
argument scattered in a few places). So waiting to implement it wouldn't solve any work burden, but you might want to wait just to avoid having n_stages
in a few new places only to be removed shortly after.
Plus I was nervous that n_stages
was starting to spread too far into the rest of the codebase anyways. That was almost the only reason I needed to modify a few existing functions (eg, sleep_statistics
). It's minimal, but still maybe less than ideal.
So again it probably comes down to whether you are okay with the n_stages
implementation having a temporary presence. If so, the evaluation stuff could merge soon and then just get modified whenever the Hypnogram class happens. Actually I was thinking about submitting that PR some time this coming week, so maybe the simplest course of action would be for me to go ahead and submit that PR, then you can see it in more detail and decide whether you think we should wait for the Hypnogram class to merge it or not. If you want to wait, we can leave it sitting there until the Hypnogram class is ready.
Looking forward to the PR! Feel free to submit it for now. I'll try to work on the hypnogram PR in the next few weeks. I'd love to release a new major version of YASA (0.7.0) around the new year, with:
FYI I started a new branch for this class here: https://github.com/raphaelvallat/yasa/tree/hypnogram_class
I added a few lines for class creation:
class Hypnogram:
"""Main class for manipulation of hypnogram in YASA."""
def __init__(self, values, n_stages=5, *, freq="30s", start=None):
assert isinstance(values, (list, np.ndarray, pd.Series))
assert isinstance(n_stages, int)
assert n_stages in [2, 3, 4, 5]
assert isinstance(freq, str)
assert isinstance(start, (type(None), str, pd.Timestamp))
if n_stages == 2:
accepted = ["S", "W", "SLEEP", "WAKE", "ART", "UNS"]
elif n_stages == 3:
accepted = ["WAKE", "W", "NREM", "REM", "R", "ART", "UNS"]
elif n_stages == 4:
accepted = ["WAKE", "W", "LIGHT", "DEEP", "REM", "R", "ART", "UNS"]
else:
accepted = ["WAKE", "W", "N1", "N2", "N3", "REM", "R", "ART", "UNS"]
assert all([val.upper() in accepted for val in values]), (
f"{np.unique(values)} do not match the accepted values for a {n_stages} stages "
f"hypnogram: {accepted}"
)
hypno = pd.Series(values, name="Stage").str.upper()
hypno = hypno.replace({"S": "SLEEP", "W": "WAKE", "R": "REM"})
if start is not None:
hypno.index = pd.date_range(start=start, freq=freq, periods=hypno.size)
hypno.index.name = "Epoch"
self._hypno = hypno
self._freq = freq
self._start = start
self._n_stages = n_stages
def __repr__(self):
return f"{self._hypno}"
def __str__(self):
return f"{self._hypno}"
Example
Beautiful.
I think I'll wait for my evaluation PR until this is available to work with. I had to nuke my previous fork (and in-progress branch) because I'm stupid, and so at this point it'll just be easier to reincorporate my previous code and accommodate this structure simultaneously. Otherwise it will be messy adding an n_stages
argument to various places and then reverting it back.
Sounds good, I'll try to have a working version with basic methods (plot_hypnogram, sleep_statistics, etc) into master
early next week. We can still add new methods later on.
FYI made some good progress today on the Hypnogram
class. This is going to be a game-changer for YASA!
Awesome!
btw I had an idea for a method here, something you might already have plans for. Below is a general idea of what I'm thinking, main purpose is to get a dataframe for exporting that has all hypno-info on it. I like to export these at the end of an auto-staging script so I can load it in elsewhere for plotting. So I might overdo it a bit, but I like to have all info possible here. Also, including things like onset
and duration
make it a BIDS-compatible events file.
Sorry I'm being lazy with variables here (just copy/pasting from my personal scripts), but I think you'll get the idea.
class Hypnogram:
def to_dataframe(self):
"""Something that compiles all epoch-level info into a single dataframe for exporting."""
epochs = np.arange(hypno.size)
df = pd.DataFrame(
{
"epoch": epochs,
"value": yasa.hypno_str_to_int(hypno),
"stage": hypno,
"onset": epochs * 30,
"duration": 30,
}
)
df = df.set_index("epoch").join(sls.predict_proba())
return df
Another suggestion: add a scorer
or scorer_id
property. Any string would be accepted, but would commonly be initials of a human scorer or name of algorithm. For example, the returned hypno from yasa.SleepStaging
could have a scorer_id
of "YASA" or "YASAv0.6.2" if you wanted to be more specific.
Yes and Yes! Love these ideas.
For scorer
, we could even make it as the default name
of the resulting pd.Series?
Yes I was thinking that too, using the pd.Series name
. I saw right now you name the pd.Series Stage
, which also makes sense. Either works.
Maybe too avoid confusion you might not accept a pd.Series as input, which would make it clearer that YASA is going to control that. Not a big difference for users, since they could just use Series.values
if they already had one, plus I imagine that 99% of the time users will initiate a new yasa.Hypnogram
instance using a list or array anyways. Or maybe you foresee a different use I'm not thinking of.
I also wasn't sure about accepting pd.Series
but eventually decided that it might make things easier for beginner users who just loaded their hypnogram from a CSV file in Pandas. YASA is converting to an array internally to make sure we don't mess things up with the previous index.
The way that YASA currently handles hypnogram could be improved. Specifically, I think that we should create a
class Hypnogram
that would serve as the new default to store and manipulate hypnogram in YASA.This class would have several methods (from existing functions) such as:
Any other suggestions welcome!