tumblr / collins

groovy kind of love
tumblr.github.com/collins
Apache License 2.0
573 stars 99 forks source link

Hazelcast needs better description of purpose/problem being solved and how to tell if it's working, etc, etc. #564

Open jbala1 opened 6 years ago

jbala1 commented 6 years ago

With latest Collins from github.com repo, the hazelcast documentation is woefully short on information. It's missing (in no particular order; this is about the embedded hazelcast only):

1) the problem hazelcast solves or assists with solving; i.e., why and under what circumstances you'd want to use it

2) what it looks like when it's working (also: how to prove that it's working)

3) how to tell when it's not working, since there's very little logging around it that's of any actual use in troubleshooting

4) what is meant by "cluster" -- a collins cluster, or a hazelcast cluster?

* if a collins cluster, does that mean hazelcast will distribute all asset data to all collins instances (incl. to their backend mysql databases) in the same hazelcast membership?  Docs don't say and that's not what I'm seeing in my setup of only two collins servers (looks like server B's hazelcast receives the asset data -- but only due to my adding a couple of new logger.trace lines -- but it never goes into server B's mysql database)

* if a hazelcast cluster, each collins server still having independent data but shared hazelcast membership/cache, then docs definitely don't say how or why this is even useful (also see item 1)

5) what requirements are there in the collins setup for hazelcast to work correctly; i.e., do the mysql databases and solr indexes need to be in sync (freshly initdb'd) before hazelcast is enabled? Does the DATACENTER asset need to be identical on all hazelcast members, or does it not matter?

6) should the /api/firehose endpoint output the same data from all members of the hazelcast cluster? Probably falls under items 2 and 3, but wanted to call out the API endpoint explicitly. (Also, not what I'm seeing in my setup; only the collins server where the asset was created outputs anything to the endpoint, but it's far from clear that I've got everything set up correctly.)

7) no mention of what state multicollins should be in -- enabled or disabled -- when enabling hazelcast. In case someone is migrating from one setup to the other. Logically, it seems like they would be in direct conflict of purpose.

Thank you.