Tokenizer constructor: Move document_count to **kwargs

Summary

This PR moves the document_count constructor argument of the Tokenizer class to **kwargs.

Quoting @fchollet:

I don't think the document_count argument serves a purpose as a user-settable constructor argument. There is no use case for changing the value by hand.

However, because we use it in TF-IDF computation (and only there) it is part of the state of the Tokenizer and it should be saved/loaded during serialization. Removing the argument entirely isn't correct for use cases when ones tries to use TF-IDF after loading a saved Tokenizer.

So I think we should move this arg to **kwargs like we do for some private layer arguments. That way, it isn't exposed to users, it isn't documented, but it's still saved as part of a serialized Tokenizer.

Background: In #106, I removed the document_count argument (if supplied, will be ignored with a warning). Also, I modified the implementation of the tokenizer_from_json method to set the document_count attribute directly on the tokenizer object, so that the state of the tokenizer would be preserved after saving and loading. But as mentioned above, @fchollet preferred moving the document_count argument to **kwargs instead of removing it. So I created this PR. Now we have two PRs that address the same issue: #106 and this one. We can merge one of them and close the other one.

Related Issues

Fixes #105.

PR Overview

[n] This PR requires new unit tests [y/n] (make sure tests are included)
[n] This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
[y] This PR is backwards compatible [y/n]
[n] This PR changes the current API [y/n] (all API changes need to be approved by fchollet)

keras-team / keras-preprocessing