Durable users - WIP / for discussion

devth commented 5 years ago

Opening this up as a place to discuss how we want to model users.

Goals of this discussion

Provide a consistent and durable model of users to build on top of
- Karma would gain a foreign key to users
- GraphQL resources like history and karma would be able to link to the user entity
- We would be able to reliably build user prefs on top of it
Figure out how and if Yetibot should support multiple adapters. Users are the primary entity that we need to consider, but all db state should be evaluated.

Multi adapter

DB entities to consider:

alias
channel
cron
history
karma
observe
status
user

Two extremes:

be truly multi-adapter, unifying users (i.e. if I'm logged in on both users the two user entities should be somehow linked and represent a single identity) and all state across all connected adapters
vhost model: as if each adapter was its own Yetibot instance with all state constrained

Current state

Currently Yetibot supports multiple adapters but it's somewhere between the two extremes.

Considerations

Feature	Pro Share	Pro Isolate
alias	If an operator creates an alias on one adapter they'd probably expect it to be available on another adapter	Sharing aliases between adapters could possibly leak team specific information
channel		Channel configuration is already scoped to individual channels by default
cron		Cron configuration is already scoped to individual channels by default
history		History is scoped to the channel and adapter it originated from. Whether we expose it beyond that is a choice we can make/change later.
karma	It might be weird to see users from another adapter (that you may have never heard of) in the Karma list, especially if you see yourself multiple times because users are not unified.	If you are a user in multiple adapters, you might expect to see a unified karma list
observe	Same logic as alias - you may expect to see observers working in all adapters that Yetibot is listening in. Observers have the option of only firing for specific channel patterns, but that means it could match multiple channels across multiple adapters	Shared observers could leak team information
status		Currently stores the adapter and channel that it was created in. Whether we expose can be decided and changed later since we have all provenance the data.
user	Wouldn't want to see a single identity represented by multiple users	It's hard to unite multiple representations of a single identity

Current use cases of multiple adapters

A single public Yetibot currently listens on both freenode and Slack. This is the shared case, where any state is expected to be exposed in all adapters.
We run a single Yetibot internally at eBay with multiple configured Slack Adapters - only because Slack does not yet support the concept of multi-workspace bots within a single org. This is shared and all state is expected to be shared across all adapters (including users, since there is technically a single org user). However: we confirmed that the underlying user ID is actually not the same. This appears to be a weird hack that Slack used when they launched Organizations (aka Enterprise Grid). I'm guessing in the future:
- these user IDs might be unified
- and they will enable org-level bots that can listen across all/any workspaces, therefore mitigating this entier problem

Concluding thoughts

If we capture provenance (which adapter and channel a record originated from) we can alter decisions like what state is exposed where down the road
If we can stitch users together we get much closer to an ideal shared-state approach (i.e. anti-vhost), especially for things like karma where you would never want to see a single identity represented by multiple users, but this could be hard. If we model it in a way that allows tieing an identity to multiple representations, we could require the user to manually specify all their IDs until we could automate it (e.g. my IRC id is ~devth and my Slack ID is @U123123 - maybe this is akin to connecting an identity on a website to disparate oauth identity providers like twitter, fb, github, etc)
What if we do nothing except capture the provenance?
- We still have the capability to run multiple adapters
- Some entities are leaked by default, like aliases and observers
- After karma gains provenance we could opt to constrain leaderboards to adapter, or allow it to span all adapters
Regarding user stitching: we could unify users if their username is identical, thereby avoiding the problem with single-org multi-workspace users having different ids in each workspace. This could even be configurable (:attempt-to-stitch? true).
Why not just give up the multiple adapter functionality and simplify everything?
- It's currently a workaround for Slack's weird enterprise grid limitations where otherwise we'd have to run/operate/maintain multiple Yetibots
- In the case of constrained resources (public Yetibot runs on a 1gb DO Droplet), we couldn't run multiple instances even if we wanted to since it requires multiple jvm runtimes, which are not cheap

devth commented 5 years ago

From @jcorrado via Slack:

I'm still of the "all of nothing, do it cleanly or don't do it at all" mindset (edited) but, I understand practical middle-ground (in theory) for me the simple thought experiment goes: were we to simplify to one adapter, what would that simplified, clarified architecture open up? and is that worth more than the what would be lost

devth commented 5 years ago

were we to simplify to one adapter, what would that simplified, clarified architecture open up?

Corollary: What if we just design everything as if there were only one adapter (while still capturing provenance)? Rationale:

I'm probably only person running multi adapter, so every operator of Yetibot aside from myself would still have clean single-adapter state and
we wouldn't have to invest any time into stitching user accounts and making everything truly multi adapter

This is essentially what I've been doing all along and it hasn't hindered me. The remaining TODOs for Users would still be:

add provenance to all state (good idea, regardless)
model users, including provenance

I think of it as a stepping stone to the ideal "all or nothing" state.

This means both sets of users show up in karma. Observers, Status, and Aliases are global. I think all of these are fine. I've always tried to let concrete use cases drive the design of Yetibot, and I don't think I have a strong enough multi adapter case (Yetibot public runs on Slack and IRC but no one uses it on IRC currently) to help drive its design while at the same time having a practical need for multi adapter.

In the future if/when YB supports Gitter or other platforms it will likely listen on all, and I think we want most state to be global in that case. Same with Clojure community chats: if/when we get Yetibot into Clojurians, #clojure IRC, clojure/general on Gitter, etc. When that happens I think I'll either have motivation to go all in or the clarity and confidence to make the call go single adapter.

jcorrado commented 5 years ago

Actively mulling this over...

devth commented 5 years ago

tldr: agree with all or nothing but want to defer the decision / work.

devth commented 5 years ago

We discussed this today starting at https://yetibot.slack.com/archives/C66AJ2EFL/p1554821184926700

Summary

Our goal is "true" multi-adapter
We seem to be in agreement on delivering fully-baked, user-proof features. Quality UI.
Even with #2 in mind, we're practical about how we build, and are willing to cut some internal corners to deliver those features, when there's a compelling opportunity. sic: Clean features but fine to 'fix up' internals later
Clojurians Slack is something we'd like to service, ideally soon to coincide with Clojure/North. We're OK levering its curtailed requirements ("Just Slack for now"), so long as features are clean. We can fix up later. This #2 and #3 in action.

Short term

Require Slack-encoded users in the karma command. This lets us get around the fact that we don't have durable users modeled yet.
Disable karma in IRC unless we can easily support it (@jcorrado will investigate)
Consider removing or disabling users command until it's properly modeled

User modeling

create or update durable users when the bot observes their existence: startup, channel join/leave events, polling if necessary, or lazy create-on-demand e.g. when a user runs a command or mentions another user in a command (like karma)
check existence of user in the db when trying to attribute karma, and handle adapter-specific user encodings.

Mid term

add provenance (chat adapter and channel) to all db state to give us flexibility on partitioning or uniting things later on
begin storing users "as seen by the bot" in database for both irc and slack (might require some discussion/planning on its own)
don't worry about leaking state across adapters for now
consider in the back of our minds that someday we might have the notion of a single identity spread across multiple adapters (possibly an identities table that joins 1 or more rows from users)

In terms of simplifying adapters, maybe the adapter just needs to provide a function from uid to fully resolved user.

devth commented 5 years ago

yetibot / core