Closed brickman1444 closed 4 years ago
Merging #4 into master will increase coverage by
22.22%
. The diff coverage is97.14%
.
@@ Coverage Diff @@
## master #4 +/- ##
===========================================
+ Coverage 51.33% 73.55% +22.22%
===========================================
Files 2 3 +1
Lines 150 208 +58
Branches 38 58 +20
===========================================
+ Hits 77 153 +76
+ Misses 70 46 -24
- Partials 3 9 +6
Impacted Files | Coverage Δ | |
---|---|---|
Markov/ChainState.cs | 69.56% <ø> (+4.34%) |
:arrow_up: |
Markov/MarkovChainWithBackoff.cs | 96.15% <96.15%> (ø) |
|
Markov/MarkovChain.cs | 64.54% <100.00%> (+19.35%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 8cb80c1...8aa21ca. Read the comment docs.
@otac0n is there any chance this can get merged in?
Hey, I took a look at your PR, and I think it's a pretty good addition. Sorry it has taken this long to respond.
I made some changes to the branch to inherit from MarkovChain<T>
directly, to enable people to swap in this implementation directly. Since I'm basing the implementation off of MarkovChain<T>
I was able to reduce the number of elements in nested collection by 1, but you'll see if (order == this.Order)
wherever we would have iterated the list.
I also credited you in the licence details in this branch. Please specifically check your credit in license.md.
https://github.com/otac0n/markov/commits/MarkovChainsWithBackoff
Please pull this branch, test my changes with your use cases, and update your PR (this one) if you approve my changes.
One thing I'm a bit worried about is serialization, so would you be willing to add a serialization round-trip test? If not, I'll find time to add it eventually, but I'll want to have that before merge.
Thanks for picking this up! Those changes work for my use case. I appreciate making it easy to swap into existing code, but if another new style of chain gets added in the future I think using an interface instead of inheritance would make it easier to add a new style of chain.
The credit in the license details looks perfect. Thanks!
Can you provide some more details about what you mean by serialization? Do you mean adding something like the Serializable attribute to MarkovChainWithBackoff, or do you mean adding some tests like Add_WhenEmpty_AddsTheValuesToTheState() and Add_WithOppositeWeight_ResetsInternalsToInitialState() ?
(Sorry, I didn't mean to close this. I think it auto-closed when I pushed my copy of the branch.)
Indeed, I did mean like adding a test similar to Add_WhenEmpty_AddsTheValuesToTheState
, in particular, it would be nice if we could round-trip from JSON->MarkovChainWithBackoff->JSON and assert the JSON is semantically identical (or even string-equals).
I'm aware this may be a tall order, and we may need to just add serialization support directly, but it would be nice if it were generally possible to restore the Markov Chain's internal state without needing to remember the entire original corpus. Previously this was achieved by querying the next-states, but now the exact next-states exposed depend on how many possible next-states are available per desiredNumNextStates
.
Something to consider: since the internal state of a MarkovChain<T>
is a Dictionary<ChainState<T>, T>
, and since ChainState<T>
may be of any length, then it may be possible to store the entire internal state of the MarkovChainWithBackoff<T>
inside of a single MarkovChain<T>
without needing nested collections. This may make serialization trivial again.
@otac0n I added tests for Add_WhenEmpty_AddsTheValuesToTheState() and Add_WithOppositeWeight_ResetsInternalsToInitialState(). The deserialization seems like it would be a lot of work, so I'd rather not implement the round trip serialization right now.
I also added some tests for GetNextStates() which I think are actually the best description of how MarkovChainWithBackoff is intended to work. I think once those tests are in we can safely do any refactors. I like the sound of storing all the states in one Dictionary<ChainState<T>, T>
. That could make the iteration inside MarkovChainWithBackoff simpler as well.
Also let me know if there are more style issues. I'm using VS Code and I'm not sure it's picking up the style cop rules. I fixed all the warnings I saw in the files I touched.
Uses the existing MarkovChain to implement a chain with back-off. Back-off means that when constructing the chain you set a desired number of next possible states. If there isn't that many possible states, we decrease the order and try again until we either have enough possible states or we reach order 1.
Code was mostly migrated from https://github.com/brickman1444/CaptainRexEbooks/blob/master/MarkovChainWithBackoff.cs which was implemented against the description of back-off written here: https://blog.dataiku.com/2016/10/08/machine-learning-markov-chains-generate-clinton-trump-quotes
Updates the GitVersionTask dependency which was necessary in order to build the netcoreapp2.0 target on my machine.