Closed elinsooon closed 2 years ago
This is a good question - I'll need to think about it.
One important property of a Markov chain (that doesn't have an absorbing state from which the process cannot deviate from, such as death) is that the amount of time spent in each state can be determined directly from the transition matrix (call it P). In particular, the state probabilities are , where k is a very large number. (Here's a presentation I found online that may or may not be helpful.)
So one idea for a possible check would be to generate a few long chains, look at the distribution of each with respect to the states, and compare that to the theoretical probabilities.
Other possibilities include (1) ensuring that the correct number of events is generated, and (2) that the number of categories matches the dimensions of the transition matrix.
If we could implement these three tests, I think that would take care of it.
I think I did the Pk thing, as well as (1). Could you clarify what you mean for (2)? My next step is going to do a check with non-wide tables with the same matrices to ensure all the data is correct in those by extension of the tests done on the wide genMarkovs
For (2), the different possible states in the data should be the same as the number of states implied by the transition matrix. So, if we have a 3x3 transition matrix implying 3 states, then the actual number of states observed in the data should also be three.
Ok got it. This logic implies that all potential states must be represented in the data, is that true? A matrix could be 4x4 but one of the states could be unreachable due to the probabilities in the matrix, so only 3 states would be observed. Is this too niche a case to consider?
Yes - that is a good observation. But if the transition matrix is well behaved (i.e. all the possible states are realistically attainable, so that the steady state probability exceeds 10%), and the chain length is long enough - say 250, then the probability that a particular state is never reached is vanishingly small. So, the key in the test is select a reasonable transition matrix and make sure the chain is long enough, there are enough individual chains, or both.
What is the best practice for testing whether a function that relies on probability works? Obviously I can set a seed and ensure that rows and columns match an identical data.table, but that doesn't seem like it's actually testing how the function works.
I'm looking at this specifically in regards to genMarkov, what might be a good game plan for testing that this function works?