otac0n / markov

Generic Markov chain generation for C#
Other
54 stars 20 forks source link

Add Markov Chains With Back-off #4

Closed brickman1444 closed 4 years ago

brickman1444 commented 5 years ago

Uses the existing MarkovChain to implement a chain with back-off. Back-off means that when constructing the chain you set a desired number of next possible states. If there isn't that many possible states, we decrease the order and try again until we either have enough possible states or we reach order 1.

Code was mostly migrated from https://github.com/brickman1444/CaptainRexEbooks/blob/master/MarkovChainWithBackoff.cs which was implemented against the description of back-off written here: https://blog.dataiku.com/2016/10/08/machine-learning-markov-chains-generate-clinton-trump-quotes

Updates the GitVersionTask dependency which was necessary in order to build the netcoreapp2.0 target on my machine.

codecov[bot] commented 5 years ago

Codecov Report

Merging #4 into master will increase coverage by 22.22%. The diff coverage is 97.14%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master       #4       +/-   ##
===========================================
+ Coverage   51.33%   73.55%   +22.22%     
===========================================
  Files           2        3        +1     
  Lines         150      208       +58     
  Branches       38       58       +20     
===========================================
+ Hits           77      153       +76     
+ Misses         70       46       -24     
- Partials        3        9        +6     
Impacted Files Coverage Δ
Markov/ChainState.cs 69.56% <ø> (+4.34%) :arrow_up:
Markov/MarkovChainWithBackoff.cs 96.15% <96.15%> (ø)
Markov/MarkovChain.cs 64.54% <100.00%> (+19.35%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8cb80c1...8aa21ca. Read the comment docs.

brickman1444 commented 4 years ago

@otac0n is there any chance this can get merged in?

otac0n commented 4 years ago

Hey, I took a look at your PR, and I think it's a pretty good addition. Sorry it has taken this long to respond.

I made some changes to the branch to inherit from MarkovChain<T> directly, to enable people to swap in this implementation directly. Since I'm basing the implementation off of MarkovChain<T> I was able to reduce the number of elements in nested collection by 1, but you'll see if (order == this.Order) wherever we would have iterated the list.

I also credited you in the licence details in this branch. Please specifically check your credit in license.md.

https://github.com/otac0n/markov/commits/MarkovChainsWithBackoff

Please pull this branch, test my changes with your use cases, and update your PR (this one) if you approve my changes.

One thing I'm a bit worried about is serialization, so would you be willing to add a serialization round-trip test? If not, I'll find time to add it eventually, but I'll want to have that before merge.

brickman1444 commented 4 years ago

Thanks for picking this up! Those changes work for my use case. I appreciate making it easy to swap into existing code, but if another new style of chain gets added in the future I think using an interface instead of inheritance would make it easier to add a new style of chain.

The credit in the license details looks perfect. Thanks!

Can you provide some more details about what you mean by serialization? Do you mean adding something like the Serializable attribute to MarkovChainWithBackoff, or do you mean adding some tests like Add_WhenEmpty_AddsTheValuesToTheState() and Add_WithOppositeWeight_ResetsInternalsToInitialState() ?

otac0n commented 4 years ago

(Sorry, I didn't mean to close this. I think it auto-closed when I pushed my copy of the branch.)

Indeed, I did mean like adding a test similar to Add_WhenEmpty_AddsTheValuesToTheState, in particular, it would be nice if we could round-trip from JSON->MarkovChainWithBackoff->JSON and assert the JSON is semantically identical (or even string-equals).

I'm aware this may be a tall order, and we may need to just add serialization support directly, but it would be nice if it were generally possible to restore the Markov Chain's internal state without needing to remember the entire original corpus. Previously this was achieved by querying the next-states, but now the exact next-states exposed depend on how many possible next-states are available per desiredNumNextStates.

otac0n commented 4 years ago

Something to consider: since the internal state of a MarkovChain<T> is a Dictionary<ChainState<T>, T>, and since ChainState<T> may be of any length, then it may be possible to store the entire internal state of the MarkovChainWithBackoff<T> inside of a single MarkovChain<T> without needing nested collections. This may make serialization trivial again.

brickman1444 commented 4 years ago

@otac0n I added tests for Add_WhenEmpty_AddsTheValuesToTheState() and Add_WithOppositeWeight_ResetsInternalsToInitialState(). The deserialization seems like it would be a lot of work, so I'd rather not implement the round trip serialization right now.

I also added some tests for GetNextStates() which I think are actually the best description of how MarkovChainWithBackoff is intended to work. I think once those tests are in we can safely do any refactors. I like the sound of storing all the states in one Dictionary<ChainState<T>, T>. That could make the iteration inside MarkovChainWithBackoff simpler as well.

brickman1444 commented 4 years ago

Also let me know if there are more style issues. I'm using VS Code and I'm not sure it's picking up the style cop rules. I fixed all the warnings I saw in the files I touched.

otac0n commented 4 years ago

https://ci.appveyor.com/project/otac0n/markov/builds/33110412 https://www.nuget.org/packages/Markov/2.0.1-ci0023