API design - Githubissues

This looks really interesting!

I am an Elixir noob who wrote his first GenServer last weekend, so this feedback may be very naive and stupid; please ignore it if so.

Although the API is very much simpler and more streamlined than the traditional GenServer, it still has a few moving parts and I wonder if they're totally necessary. From my reading of the readme, it sounds like there are exactly 3 things that you might want to do when you interact with a component:

Update the state of the component
Get a reply from the component
Update the state and get a reply

With the current API, there are two things that the user needs to do to achieve these results:

Choose between one_way and two_way (2 options)
Choose whether to use set_state, set_state_and_return, or neither (3 options)

That gives you 6 possible combinations of things you could write, but only 4 of them will do what you want.

two_way -> OK (replies without updating the state)
two_way + set_state -> OK (updates the state and replies with the new state)
two_way + set_state_and_return -> OK (updates the state and replies with something else)
one_way -> OK (updates the state without replying)
one_way + set_state -> ?
one_way + set_state_and_return -> ?

Would it be possible/desirable to have an API that was more like this?

defmodule KV do

  use Component.Strategy.Dynamic,
      state_name:    :kv_store,
      initial_state: %{}

  handle add(kv_store, key, value) do
    %{state: Map.put(kv_store, key, value)}
  end

  handle get(kv_store, key) do
    %{reply: Map.get(kv_store, key)}
  end

  handle merge_and_tell_me_the_new_value(kv_store, key, new_value) do
    new_kv_store = Map.update(kv_store, key, new_value, fn old_value -> old_value + new_value end)
    merged_value = Map.get(new_kv_store, key)
    %{
      state: new_kv_store,
      reply: merged_value
    }
  end
end

That way, the user doesn't have to think about which type of function they're writing - they just need to specify what they want to get out of it - a new state, a reply, or both.

Interesting questions.

Actually there are only 3 possibilities:

one_way can only update state
two_way can return a value
two_way can return a value and update state

I deliberately chose to make it slightly less convenient to do #3, because I think it is normally a good idea for an API function to do just one thing. However, nothing is set in stone, and as we get more experience with this, I'm sure it will change.

As for the idea of always returning a map: again this is something I looked at. In the end I preferred the fact that in the current design the bodies of one_way and two_way look just like the regular functions you'd write in regular (nonserver) code.

But, as I say, this is just a first pass, where we all gather feedback.

This is mostly a dump of thoughts so pick it apart...

A common pattern in my 'simple' Applications/Components that are re-useable is a bit different from the above as well. I'll use my https://github.com/OvermindDL1/task_after library as a reference.

Global/Dynamic Components

Public Interface

First I will speak of the Global/Dynamic versions of this Component library So first of all is the public interface, it has a few callbacks, they all kind of take most of the same options so I'll just use one as an example:

TaskAfter.task_after/3, it's arguments are of the form TaskAfter.task_after(after_time, callback, opts), and the opts can be:
- name: name | pid: pid -> Specify a non-global task handler, if unspecified that the application :global_name must be specified
- id: id -> A unique id, if nil or unspecified then it is auto-generated
- call_timeout: timeout -> Override the timeout on calling to the TaskAfter.Worker
- no_return: true -> Do not return the id or error, just try to register and forget results otherwise
- send_result: pid -> Sends the result of the task to the specified pid
- send_result: :in_process -> Runs the task in the TaskAfter.Worker process to do internal work

Naming and access

Essentially, TaskAfter has these 'naming' capabilities:

It can be a Global or Named Local, or Unnamed Local interface, in other words the user of this library can:
- Set config :task_after, global_name: TaskAfter replacing TaskAfter with whatever global name they want. If and only if this configuration is set then on application load it starts up the gen_server and registers it under that name. This name can also be accessed like a Local Named call. This is the name used when a name or pid is not otherwise given.
- The user of this library can add the TaskAfter gen_server to their own supervision tree (or just raw call or whatever they want to do) and pass in name: SomeName to have it register itself under that name, then name: SomeName can be given to the opts of any call to the module and it will access that instance of it for the call.
- The user of this library can add the TaskAfter gen_server to their own supervision tree (or just raw call or whatever they want to do) with no name option and it registers itself as just an unnamed gen_server. The user can then pass the pid as pid: pid into the opt of any call to the module and it will use that instance for the call.

Call timeouts

Next the caller of the module can define the timeout that they want the module function call to take (when an internal call is used, not a cast), it defaults to the gen_server default of 5000 (5 seconds), but the user can override it. I would also prefer if a different default could be specified on the use Component... declaration as a 'global module default, as well as the ability to override the default on a per-function (one_way/two_way`) definition as well as sometimes a default of 5 seconds doesn't make sense (like when working on a remote server request that processes data and you want, say, 30 seconds to be a good default). It would suck to make the user of the module always have to override the timeout for a given call (or get many 'surprises').

Call/Cast options

Every call to the TaskAfter module is an internal call by default, however it can optionally be a cast by passing in no_return: true and able to state how to return the data, either by default returning it straight from the call (if no_return is not true) or it's able to return it to a pid. However I'm not a fan of this overly specialized to TaskAfter setup, what I'd prefer is something like this:

return: <return_option> -> Where <return_option> can be false/true/<callback>/{<id>, <pid>}/{<id>, <name>}/{<id>, <callback>}:
- false -> Just returns :ok immediately, this will perform a cast internally.
- true -> Returns the return value of the call upon completion, this will perform a call internally.
- {<id>, <pid>} -> Just returns :ok immediately, this will perform a cast internally but the return value of the function is send to the given pid as the tuple {<id>, <result>} where <result> is the result of the function called and <id> is any term() that the user passed in (good to distinguish the data and make receive's easier, plus it's just good form).
- {<id>, <name>} -> Just returns :ok immediately, this will perform a cast internally but the return value of the function is send to the given registered process name (where name is the usual atom, tuple like {:global, <atom>}, {:via, ..., ...}, etc..., the usual), but it otherwise the same as {<id>, <pid>}.
- <callback> -> A callback either in anonymous tuple form or in MFA tuple form ala {<module>, <function>, <args>} (which is called like apply(<module>, <function>, [<result> | <args>])), this allows the user to handle the result however they like very easily without waiting on a return, think standard monad'y pipelining or javascript callbacks.

This does get rid of the built-in option to both send to a pid and 'wait' for the call to complete that TaskAfter does, but that could be supported by accepting something like a sync_return: <return_option> in addition to just the return: <return_option> to force that as well. It is very useful to both handle the return some other way while making sure it actually succeeds or fails, not a common use pattern but very useful when it does happen.

It might even be useful to have an error handling version as well to return the error (like the exception structure itself or so) to a given wrapped pid or callback or so as well.

How to encode the options into the public interface without messing with the user API

The above capabilities changes the public interface to allow the caller to handle anything in any way that is best for the given task without being forced into whatever pattern the module author has chosen. Like in TaskAfter all forms are quite useful. This does however mean that the opt optional argument at the end of a call can take certain options, and thus how to resolve those with the user function call possibly-optional last argument can be interesting if it is not a list, however that can be worked around pretty easily by always having another optional list to handle 'just' the above options fine, so if a user defines a function that takes blah(a, opts \\ []) then the public interface would have blah(a, opts \\ [], gen_server_opts \\ []), which could be called as normal if none of the above options are needed as blah(42, a: :b, c: :d) or if one of the above options is wanted then can do blah(42, [a: :b, c: :d[, return: self()), which is a nice little escape hatch. At the default with no options needed then it would be called just as is-designed.

Error handling

Every call can fail, like if the gen_server is not running for example, and by default GenServer.call/2/3 raises, but that isn't always what the caller wants or needs, so they end up wrapping and so forth, and that still doesn't handle errors that happen inside the GenServer via cast, thus I propose that every public function has a postfix ! variant (so for the above a blah!(..) would also exist. The ! version acts as normal, it throws, the non-! version has an internal rescue/catch that just returns the error as a normal result tuple, essentially it generates this:

def blah(a, opts \\ [], gen_server_opts \\ []) do
  {:ok, blah!(a, opts, gen_server_opts)}
rescue
  e -> {:error, e} # Maybe add the stacktrace?
catch
  e -> {:error, e} # Maybe add the stacktrace?
end

Private Interface

State name?

Why is the state_name existing? Is it to allow arbitrary state positioning in the function call? Why not have an option to leave it out and just append it to the argument list as normal, or rather prepend it to the argument list since this is Elixir and Elixir likes to have function opts at the end with the main 'state' at the beginning of the function call?

Data handling

The one_way and two_way doesn't quite allow for the proper handling of the data flow. Right now one_way only returns the state and two_way can return a return value or a return value and state via an odd setup with the set_state with a block, when that seems needless.

There are a couple of issues here:

Instead what should be done is a value should always be returned, always, even if it's just a plain :ok just something should always be returned because in an expression language something is always returned.
It doesn't allow for setting any GenServer options just as hibernating or delaying a return past the existing call, etc...

Thus instead I propose a singular call (using whatever name you wish, I'd probably just use something like def! or def_something or whatever) that handles all cases. First its argument should be of the form def_something blah(state, to, rest, of, args, here) where rest, of, args, here is whatever arguments they want the function to take, however state should default to the front of the function definition (you could of course still override it by using the name specific in state_name: ... but it should default to the front if that name does not appear as an argument elsewhere in the argument list), and the to will be basically what the caller passed in to return(/return_sync), maybe massaged a bit, however it should be considered 'opaque' to the user of the Component library and only used to pass into a call later.

The usages of the function should be that it has to return something, always, and thus the last call of a function should be something like return(state, to, value \\ :"$no_return$", opts \\ []), where the new state is given first (easy piping), the to is second, and an optional value is third that defaults to :"$no_return$", which means that the :noreply genserver return tuple is used, which means that to needs to be handled some other way, like if they already return_to(to, value)'d or stored the to somewhere for later sending or so. What this return/4 function does is figure out 'how' the value should be returned (if at all) and construct the appropriate return tuple for the genserver. opts allows setting things like if it should hibernate: true or stop: true or continue: stuff is set.

Essentially if to matches this then it does this when return/4:

_ when :"$no_reply$" -> does a {:no_reply, ...} or so
false when not :"$no_reply$" -> does a `{:no_reply, ...} or so
{true, <pid/name/callback>} when not :"$no_reply$" -> return the value
{true, <id>, <pid/name/callback>} when not :"$no_reply$" -> return the value wrapped like {<id>, value}

And return_to/2 does these:

false -> no-op
{true, <pid/name>} -> calls GenServer.reply(<pid/name>, value)
{true, <id>, <pid/name>} -> calls GenServer.reply(<pid/name>, {<id>, value})
{true, <callback>} -> calls <callback>.(value)
{true, <id>, <callback>} -> calls <callback>.({<id>, value})
{true, {module, function, args}} -> calls apply(module, function, [value | args])
{true, <id>, {module, function, args}} -> calls apply(module, function, [{<id>, value} | args])

Summary

In essence this would keep the actual user interface very simple while still giving the full power of the cast/call system of gen_server while letting the caller of the gen_server dictate how, where, and if they want the result handled.

Such a usage of the above proposal would look like this (altered from the readme, using def_handle here for the function name, or whatever is come up with, I'm not happy with any names I'm thinking of):

defmodule FourOhFour do
  use Component.Strategy.Normal,
      state_name:    :history,
      initial_state: %{}

  def_handle record_404(history, to, user, url) do
    Map.update(history, user, [ url ], &[ url | &1 ])
    |> return(to)
  end

  def_handle for_user(history, to, user) do
    return(history, to, Map.get(history, user, []))
  end
end

And used like:

# Use global version if the config is set:
FourOhFour.record_404("bob", "https://google.com/404")
FourOhFour.for_user("bob")

# Make a little local one here
pid = FourOhFour.create()
FourOhFour.record_404("bob", "https://google.com/404", pid: pid)
FourOhFour.for_user("bob", pid: pid)

Some random other notes and thoughts.

How to set a minimum and maximum of the pool strategy?
Hungry should probably also accept a reducer callback function (anon or MFA) as well.
Hungry default_concurrency: ... should probably also accept a function so a concurrency can be set at runtime (such as if they want to specify, oh, half the scheduler count by default).
Because you declare the name to be used as the state variable, you can omit it as a parameter to one_way and two_way and the component library will add it in for you: Yeah this is bad... ^.^;
Why would I even countenance such an evil use of the dark arts? It's because I wanted to be able to write the one- and two-way functions to reflect the way they are called and not the way they're implemented. Good reason, but I'd prefer a definition API more like this then instead:
```
defmodule FourOhFour do
  use Component.Strategy.Normal,
      initial_state: %{}

  def_handle history, to, record_404(user, url) do
    Map.update(history, user, [ url ], &[ url | &1 ])
    |> return(to)
  end

  def_handle history, to, for_user(user) do
    return(history, to, Map.get(history, user, []))
  end
end
```
As I think this is far more clear, explicit, and the function arguments are still as they are called by the user

You can override this initial state when you create a component by passing a value to create(). This sounds really questionable, the state of a gen_server is an internal implementation and never should be exposed or given by the outside. The override is good but honestly I'd still prefer it to be a function style, like:

defmodule FourOhFour do
  use Component.Strategy.Normal

  def init(), do: %{} # Default state is `%{}`, maybe it could take a `nil` arg for `init(nil)`
  def init(m = %{}), do: m # But the user can override it if it is a map, probably you want to sanitize it here too for the right format

  def_handle history, to, record_404(user, url) do
    Map.update(history, user, [ url ], &[ url | &1 ])
    |> return(to)
  end

  def_handle history, to, for_user(user) do
    return(history, to, Map.get(history, user, []))
  end
end

Instead of using șțąțɇ as the state argument, should just replace the name inline, or use a binding in a different scope as then state in that scope wouldn't conflict with state in the user scope, so there is just no worry at all then.
If not run as a global config name the initialize and create methods need *_link versions for safety with putting into an existing OTP hierarchy.
All the extra module definitions and helper module if needed and so forth adds a lot of indirection and prevent the OTP from optimizing out in-module calls. What should probably be done is the entire gen_server handling and all functions should go in the main named module but the 'internal' calls thereof just have @doc false applied to them so they don't appear in autocompletion or documention, only the public interface itself would. So yes internally it's still the monolithic module but only the public interface is properly viewable and all calls work as expected and as efficiently as expected. You really don't want to add in remote gen_server implementation calls if possible as it could cause premature death of the gen_server in code_change states.
Hungry should probably also have an option to state the minimum number of record passed to a worker before a worker is used since if processing a single record is super fast you may want to set it to 10000 or so but if a single record takes 3 seconds (like an HTTP query) then you'd want to start with 1 per worker, this really only matters with low amounts (situation dependent low amount) anyway but could be quite a performance boon in those cases. Maybe even an option to allow a low amount that doesn't warrant the message transfer time to a worker to just run in-process of the caller themselves (default disable that option just in case the processing is 'dirty', so the user knows what they are getting into with that, like proper error handling).
Didn't see where it says what Hungry does in the face errors in the workers?

Also need a way to define the return information to gen_server from the init callback, like what if you want to hibernate immediately, or ignore, or stop, or continue, or etc... Maybe init should be a normal callback that calls return/4 or so?

@pragdave I saw that get added, it didn't exist when I originally started writing up my comment. :-)

I'm not a fan of the whole callback body style however, it seems a bit shoehorned in. I'd probably opt for keeping them base level, something like (def_interface now, I don't know what name is good for that...):

defmodule FourOhFour do
  use Component.Strategy.Normal

  def init(), do: %{} # Default state is `%{}`, maybe it could take a `nil` arg for `init(nil)`
  def init(m = %{}), do: m # But the user can override it if it is a map, probably you want to sanitize it here too for the right format

  def_interface history, to, record_404(user, url) do
    Map.update(history, user, [ url ], &[ url | &1 ])
    |> return(to)
  end

  def_interface history, to, for_user(user) do
    return(history, to, Map.get(history, user, []))
  end

  def handle_info(...), do: ...
end

Keep them top level, even allow handle_call/handle_cast for handling direct GenServer messages outside of the public interface (combine them appropriately so the def_interface version get matched on first of course).

They aren't base level for dynamic components, where the server is a separate module from the api.

where the server is a separate module from the api.

I touched on that above as well, the separation is a bit painful in a couple of ways but it can all be put into one module if the non-public 'public' functions are marked as @doc false so they don't auto-complete and so forth.

pragdave / component

API design #7