tiian / flom

FLoM: Easy to Use Distributed Lock Manager; a tool to synchronize the execution of commands, shell scripts and custom programs.
http://www.tiian.org/flom/
GNU General Public License v2.0
27 stars 6 forks source link

Simple lock state output from the daemon #29

Closed damoclark closed 8 months ago

damoclark commented 1 year ago

Hey Christian,

Is it possible to view a dashboard from the flom daemon of the current lock state in the daemon?

Form instance, which resources, which clients in the queue for those resources and their requested lock (and resource quantity). Being able to check and even monitor the lock state would be really helpful in identifying deadlock situations or to identify where one resource is tied up by a client.

Perhaps it could be implemented in the protocol as a request response type event that send json encoded state information. Or for security, limit it to Unix Domain Socket connections so it can only be received on the same host as the daemon.

Or maybe even more simply, have the daemon respond to a SIGUSR1 signal to dump a 'state' file encoded as json to a configured path.

What do you think?

tiian commented 1 year ago

Hi Damo, the request for "observability" is interesting because it implies FLoM is used for "things that matter", so I'm in favour of enhance this aspect.

Before entering the technical aspects, it would be helpful for me to understand a bit of context.

Question 1: can you describe the scenario / use cases you are deploying FLoM in? Number of requests per unit of time, number of clients, number of resources, type of resources, etc... Any info will be useful to better understand your context.

Question 2: can you describe the observability scenario you would like to implement? Could be you want to grab metrics (Prometheus / Victoria Metrics, etc...), could be you want to parse well structured log files (Splunk / Elastic, etc...), could be you have a completely different idea. If you can share a diagram of your observability architecture, that would be great.

Starting from a real scenario I can figure out how to make it useful for you and general enough for others. I can't commit on timing, but this could be something useful to add to FLoM.

Best Ch.

damoclark commented 1 year ago

Hey Tiian,

Before entering the technical aspects, it would be helpful for me to understand a bit of context.

Question 1: can you describe the scenario / use cases you are deploying FLoM in? Number of requests per unit of time, number of clients, number of resources, type of resources, etc... Any info will be useful to better understand your context.

I am using flom to provide distributed locks to co-ordinate shared access to a Restic backup repository. Restic is a client-based backup utility that is designed to support cloud service APIs. This means that it does not rely upon a server-side counterpart for operation, but rather just basic file or object-backed storage. The implications of this is that locking semantics used by Restic to co-ordinate CR, CW and EX access to the repository do not support blocking/queuing.

Backups in my environment are scheduled on individual hosts using cron and all at the same time. I use flom to limit how many backup tasks can run locally on each individual host (as Restic is very memory hungry), and how many Restic clients can concurrently write to a repository to prevent too much IO contention. For operations on the repository that require exclusive access, these too are co-ordinated through flom, each acquiring their lock and releasing allowing each host to perform their operations on the shared repository in turn, without polling.

I have 20 clients, ranging from physical hosts, virtual hosts, and small iot devices (Raspberry Pis) About a dozen resources in a mixture of numeric and hierarchical. Requests per time is about 60 per hour, but the locks are long running. From the start, all resources are freed after 4 or 5 hours. Or at least they should. This is one of the things I'm trying to diagnose.

Here are some example commands:

server1 crontab

flom -r 'server1[1]' -- flom -r 'linux[5]' -- flom -r /restic/repo/linux -l CW -- resticprofile rootfs.backup

Perform restic backup of rootfs with concurrent write lock on '/restic/repo/linux' And only five concurrent restic backups to write to linux repository Allow only one restic command to execute concurrently server1

backup-server crontab

flom -r 'backup-server[2]' -- flom -r 'linux[5]' -- flom -r /restic/repo/linux -l CW -- resticprofile boot.backup
flom -r 'backup[2]' -- flom -r 'linux[5]' -- flom -r /restic/repo/linux -l CW -- resticprofile rootfs.backup
flom -r 'backup-server[2]' -q 2 -- flom -r 'linux[5]' -q 5 -- flom -r /restic/repo/linux -l EX -- resticprofile linux-jobs.check

Perform restic backup of rootfs & boot using a concurrent write lock on '/restic/repo/linux' repository Allow only five concurrent restic backups to write to shared linux repository at a time And allow only two restic commands to execute concurrently on backup-server

Run linux repository checks with an exclusive lock on '/restic/repo/linux' And with exclusive access to shared linux repository 'linux[5] -q 5' And as the only restic command running on host backup-server

Question 2: can you describe the observability scenario you would like to implement? Could be you want to grab metrics (Prometheus / Victoria Metrics, etc...), could be you want to parse well structured log files (Splunk / Elastic, etc...), could be you have a completely different idea. If you can share a diagram of your observability architecture, that would be great.

A completely different idea, although metrics and structured logs would be useful.

Because locks are held for upwards of hours for some long running tasks, it would be useful to be able to view a real-time live dashboard of the internal state of the server, so I can tell what is happening. So:

  1. what resources exist in server memory, and for each resource a queue showing:
  2. clients currently holding a lock on the resource
  3. when they acquired the lock (how long they have held it)
  4. and the lock type
  5. followed by remaining clients blocked in queue waiting to acquire a lock
  6. when they connected and started waiting for the lock (how long waiting for it)
  7. and the lock type requested

In terms of how it could be visualised, I am thinking a table, with columns representing resources and first rows representing clients currently holding locks, and remaining rows representing client waiting to acquire a lock (in order of service).

Hope this makes sense. Ask away if anything is unclear or missing.

Cheers, D.

tiian commented 1 year ago

Hi Damo, after some thoughts, one possible straightforward implementation could be as described below:

The above idea allows for several integration like for example:

Let me know if it might fit your needs. Best Ch.

damoclark commented 1 year ago

Hi Christian,

Yes, this sounds good. A couple of probing questions though.

  1. Would it be possible to turn this logging on and off without requiring the flom deamon to restart?
    I can't recall if the flom daemon can re-read its config and maintain its state in response to a signal (e.g. SIGHUP or SIGUSR1) like many daemons do. But if there is a problem to diagnose, it won't help if the daemon must be terminated (and all locks lost) to enable the state logging out.
  2. Can the json state information be stored somewhere configurable and separate to existing logging facilities like syslog and the tracefile? I recall that the tracefile is used for other debugging purposes, although I thought it was more for debugging flom itself, rather than its operation. For it to be useful, it would be important to be able to parse the output as standard jsonl, without any headers or other data mixed in.
  3. How hard would it be to complement the 'push' design model you propose with a 'pull' model as well? If there isn't the ability to use signals to change the config of a running flom daemon without losing its internal state, could the client API include requests that will pull state information in jsonl from the server? Perhaps the API could request a stream of state, providing the frequency of updates (e.g. 1 update per second) that is sent to the client, until disconnection or turning off the output.

Hope this helps.

D.

tiian commented 1 year ago

Hi Damo, there are some consequences in implementing "natural" behaviors.

About the first question:

Would it be possible to turn this logging on and off without requiring the flom deamon to restart?

No, currently not, mainly because several config parameters could not be changed without a restart. Anyway, for such a rate of logging, why caring about turning on and off? You should just turn it on: I don't expect TB of data, just one JSON document every X seconds.

About the second question:

I recall that the tracefile is used for other debugging purposes, although I thought it was more for debugging flom itself, rather than its operation.

not necesseraly: the trace capability is quite flexible, I can dedicate a specific trace code for monitoring, you could obtain exactly the internal messages you want. On top, the header would be unique, easy to catch with any sort of grep like command utility.

For it to be useful, it would be important to be able to parse the output as standard jsonl, without any headers or other data mixed in

no other data mixed in: an header that can be easily removed with a simple sed command and you'll get the pure JSON documents, one every X seconds; I can provide documentation for the sed command, no worry

The third question opens more deep architecture topics:

How hard would it be to complement the 'push' design model you propose with a 'pull' model as well?

Extending FLoM client / server protocol to collect performance information is technically feasible and someway architecture aligned, but there's a drawback. FLoM can be configured without security or with TLS mutual authentication. The TLS mutual authentication configuration, I don't know if you use it, requires several additional steps and the not negligible burden of certificate management. Anyway, FLoM does not provide any authorization capability: if you can access to the daemon, with or without TLS mutual authentication, you can do "everything". Today "everything" means: "ask for a lock". A malicious user can use a DDOS attack flooding a FLoM daemon, but cannot extract information from a FLoM daemon. (There's a system way of course: directly accessing the memory mapping of the process, but that requires root privileges to the system). If flom client retrieved information from the flom daemon, it could be used as a source of information about running processes in the network. The more interesting the monitoring information provided by such a client, the more appealing to use them for retrieving information about your distributed processes and network topology. Not nice, from security perspective. The complicated way to solve the issue would be to introduce an authorization capability that restrict the monitoring only when some conditions are met: you may imagine a lot of sloppy implementation based on password no one wants to store somewhere. Forcing the usage of TLS mutual authentication for implementing authorization would be a solid technical foundation, but forcing the usage of TLS would be overwhelming. There are some easy to implement / easy to use authorization mechanism, but in the end they are all quite ridicolous. The easiest way could be adding a configuration parameter, by default off, to enable client side monitoring: if you turn it on, you are aware that you are exposing your internal flom daemon info to who ever is able to connect to it. This last option is feasible, but I'm not happy about it: before or later some one will complain and ask to have something more secure "I want to turn on, but only for a client that ....."

There's even another option to pull the info:

  1. adding to flom client the capability to ask for monitoring information dump (dump the JSON document with the internal state)
  2. adding to flom daemon the information dump locally

the pattern has some in common with the signal way to do it, but some pros/cons:

A note on signals: signals and POSIX threads are not good friends, some concurrency issues require complicated pieces of code. FLoM architecture is POSIX threads based and deliberately "light and straighforward" to provide the lowest possible latency in lock management: I'm frank on this point, introducing signals will introduce bugs difficult to stabilize.

To sum up: I don't see a real issue related to questions 1 and 2. As a first version it could be tried the usage of trace file that's already designed for providing internal information; if real issues emerged, adding an extra dedicated log would not be a big deal.

Question 3 opens a definitively interesting topic, but the potential security issue makes me skeptical the use case will be self-consistent and will not open a chain of "minor" topics. Not sure the investment will pay back.

Feel free to add your point, I'm appreciating the discussion.

damoclark commented 1 year ago

About the first question:

Would it be possible to turn this logging on and off without requiring the flom deamon to restart?

No, currently not, mainly because several config parameters could not be changed without a restart.

I understand the complexity of having to merge existing configuration and running state with new configuration options. Apache for example does this, and does it well, and for similar reasons why I think flom needs to support this functionality too. Those configuration parameters that cannot be changed without a restart generate an error and refuse to reload. But I am guessing it will add a substantial amount of work to analyse which of these apply and the logic to implement it. A big undertaking I'm sure. So its easy for me to suggest when I'm not doing the work. :)

Anyway, for such a rate of logging, why caring about turning on and off? You should just turn it on: I don't expect TB of data, just one JSON document every X seconds.

I tend to be wary of making such broad judgements as it can be difficult to know how how people might be using your tools and on what hardware. I use iot devices. Some have only MB to spare. The use case I was highlighting was simply to be able to debug/troubleshoot the running state of the locks, when it appears things aren't progressing as you expect. Either to detect which distributed task is the blocker, or which tasks are contributing to deadlock. But it might also be useful to see progress or current activity on distributed tasks. It's not currently possible to see into the flom black box at present.

About the second question:

I recall that the tracefile is used for other debugging purposes, although I thought it was more for debugging flom itself, rather than its operation.

not necesseraly: the trace capability is quite flexible, I can dedicate a specific trace code for monitoring, you could obtain exactly the internal messages you want. On top, the header would be unique, easy to catch with any sort of grep like command utility.

So when you say the header, is it in the same line as the log information is relates to (i.e. like SYSLOG), or is it the line above?

For it to be useful, it would be important to be able to parse the output as standard jsonl, without any headers or other data mixed in

no other data mixed in: an header that can be easily removed with a simple sed command and you'll get the pure JSON documents, one every X seconds; I can provide documentation for the sed command, no worry

I'm quite familiar with sed. The reason I raised this is just to simplify access without having to filter and translate the logging output to extract the jsonl. Especially when that is the sole purpose of that logging. Given it's specifically machine readable output, it makes sense to me to log it separately from human readable output in the tracefile. I understand why you are seeking to reuse the framework you already have for logging, but I don't think their purposes sensibly align. But I concede my opinion matters little when I'm not the one having to do the work. :)

Extending FLoM client / server protocol to collect performance information is technically feasible and someway architecture aligned, but there's a drawback. FLoM can be configured without security or with TLS mutual authentication.

I understand the security issue here. I am using TLS mutual authentication. I also understand the issue with unix signals and posix threads. Concurrency in C is a bugger.

What I was thinking to address the security issue with client access was to limit issuing pull requests for state information to the unix domain socket on the host the server is running on. This limitation means, with appropriate socket file permissions, that only the effective user running the flom daemon can request the state information on that machine. This makes the security similar to dumping state information in response to signals - only the effective user of the flom daemon has permission to send signals to the daemon. But being part of the protocol means that you don't have to deal with the concurrency issues. It might be a separate thread, and may even have its own dedicated UDS. This could then be expanded into the future as a means of implementing more administrative control over the running daemon into the future. Even a separate admin client program that issues commands and responses to a running flom daemon. This is a very common pattern for daemons.

From my perspective, not implementing a means for the flom daemon to be reconfigured without losing its state (i.e. server re-read) is quite a limitation from what is otherwise, a very feature rich technology. If you need to change something, even trivially, all your currently running jobs are lost.

Anyway, just further input and suggestions for you to consider.

D.

tiian commented 1 year ago

Hi Damo, I cannot promise, but after some research an elegant way to retrieve the data from the daemon could be implementing a virtual filesystem (like /proc) using FUSE: this would allow to retrieve the data from the flom daemon, to guarantee (local) security, to avoid strange architecture patterns. The file system would be var/run/something to be mounted in /usr/local or somewhere else. Initially the filesystem would contain a single file with the dump of the lock state in JSON format. Let me know your opinion about it.

Best Ch.

damoclark commented 1 year ago

Hey Christian,

That's a novel approach. JSON format sounds simple and sensible. And being fuse, root permissions shouldn't be necessary. What happens when the flom daemon terminates? Does it automatically umount the virtual filesystem? Building on the approach, are you thinking that config could be changed by writing into other virtual files in the file system, like /proc?

tiian commented 1 year ago

Hi Damo, I'm not an expert of FUSE, I have never developed using it before, but as far as I know, I understood:

Stated that, I will have to face at least a couple of technical issues:

  1. if I'm not wrong, I'll have to support a couple of versions of libfuse due to the wide spectrum of Linux versions supported by FLoM
  2. using libfuse with a multithreaded process does not allow the vanilla approach and a low level function must be used instead

supposing I will successfully overcome the above obstacles, and no special surprise will appear, the approach opens the door even for read/write config properties, but initially I'll focus on providing an external file based representation of the internal state.

According to my little spare time, I hope to start coding something in September: I'll keep you posted on git commits that are eligible for some sort of testing on your side.

Best Ch.

damoclark commented 1 year ago

Hey Ch,

Sounds good, and an interesting mini-project.

Happy to test from my end.

D.

tiian commented 1 year ago

This commit 32982aa34865b5ca14eb6730c82585979dcb7245 contains a technical example of incorporating FUSE in FLoM by mean of libfuse low level APIs. There are no apparently noticeable conflicts between the two, even if the evil is in the details and I suspect some issue will appear before or later: libfuse seems to be mainly designed for creating a process that manages a filesystems, not to add a filesystems to any sort of process. Anyway I'll try to go on and to obtain a prototype with something useful.

damoclark commented 1 year ago

I had a quick peek at the code. Haven't written C in many years.

It will be interesting to see how you go with a prototype and whether you only need to implement the bits of the fuse api that are needed to provide the state information. Let me know when you are ready and I can give your prototype a go.

D.

tiian commented 1 year ago

The first serious showstopper appeared. To keep the architecture as simple as possible, I tried to figured out a way to access the real data, managed by different threads, can be accessed by thread dedicated to serve the FUSE virtual file system. Initially it seemed quite straightforward and few semaphores appeared to be enough to guarantee the reader (VFS thread) is reading stable data; reading unstable data in C is not suggested at all every time the data is dynamic / there's a pointer. If a thread reads strings (char *) another thread is modifying, the chance of a memory exception (program crash) becomes extremely real. With dynamic arrays and similar the situation goes even worse. Unfortunately, looking better where the information are stored, how the code is designed, it comes out there are 2 equally bad alternatives:

The above is just a description of what analyzed and, if I evaluated it in the right way, prove to be an unpleasant journey.

The alternative, I didn't appreciate since the beginning, is to have internal duplication of the information, creating a dedicated data structure that can be read by the VFS thread in a safe way. Duplicating data is synonymous of introducing data inconsistency: I expect that here and there some data will be not exactly as they should be - once again bugs to fix - but I see some advantages: no impacts on software stability in terms of undesired crashes and in terms of undesired hangs due to deadlocks and equivalent synchronization issues.

Lesson learned: observability should be day 1 requirement in software design. I'll keep the status posted.

damoclark commented 1 year ago

Hey @tiian

I understand the quandary.

Have you considered using message passing rather than locks for synchronising state data access? Dedicate a thread to maintaining the duplicate state information, and have other threads send their updates through a socket that is shared internally between the threads. The thread maintaining the state information can select on VFS and the message passing sockets and is made responsible for maintaining consistency and synchronising read/write accesses. The buffering on the socket could substantially reduce locking latency as well.

If you did it this way, you could also use a UDS rather than VFS for querying the state, with a command line switch. e.g. flom --query connects to the separate daemon UDS to output json encoded internal state. This might simplify things over VFS.

Although the approach still doesn't prevent the duplicate data from getting out of sync.

Something a little more radical is to consider a more modern programming language that make concurrent programming less error prone and port Flom. I think its clear that C is going to decline in usage, so this would offer more longevity to Flom as well. The two primary candidates I would suggest are Rust and Go. Neither would be onerous to port from C. And if you re-engineer Flom to have a documented line protocol for its socket interface (if it doesn't already), then clients can be created for any language natively, which avoids the sometimes tricky task of binding some languages with Flom C libraries. You could even use protobuf, which has support for many languages.

Not at all a trivial task. But if it scratches a new itch for you, something to consider.

D.

tiian commented 1 year ago

In commit 444664e66f1bd962054a71c43dfeb587c542b24e there is a first demo version. It requires FUSE version 2 to build (old systems), I'll adapt it to FUSE version 3 afterward. It's not something that can be used in production!!!!!!!!! Only absence of memory leaks has been inspected. Man page is updated:

   -m, --mount-point-vfs=DIRNAME
       DIRNAME of an existing directory that must be  used  as  the  mount
       point  for a Virtual File System (VFS) based on FUSE (Filesystem in
       USErspace); the VFS is used to provide information about the inter‐
       nal  state  of  the FLoM daemon like for example the active lockers
       (it works like /proc and /sys VFS). When the FLoM daemon exits, the
       VFS  is  automatically  unmounted,  but  in  case it's not, you can
       unmount it manually with "fusermount  -u  DIRNAME"  or  with  "sudo
       umount -l DIRNAME"

It does not create a single JSON file, but a full UNIX-ish "everything is a file" structure:

tiian@ubuntu1404-64:/tmp$ find flom-vfs/
flom-vfs/
flom-vfs/status
flom-vfs/status/lockers
flom-vfs/status/lockers/1
flom-vfs/status/lockers/1/resource_name
flom-vfs/status/lockers/1/resource_type
flom-vfs/status/lockers/2
flom-vfs/status/lockers/2/resource_name
flom-vfs/status/lockers/2/resource_type
tiian@ubuntu1404-64:/tmp$ ls -la flom-vfs/status/lockers/
total 0
dr-xr-x--- 2 tiian tiian 0 ott 22 17:51 .
dr-xr-x--- 2 tiian tiian 0 ott 22 17:51 ..
dr-xr-x--- 2 tiian tiian 0 ott 22 17:51 1
dr-xr-x--- 2 tiian tiian 0 ott 22 17:51 2
tiian@ubuntu1404-64:/tmp$ cat flom-vfs/status/lockers/2/resource_name 
R2
tiian@ubuntu1404-64:/tmp$ cat flom-vfs/status/lockers/2/resource_type 
simple

The information currently provided are quite far to be useful, but it's intended as a demo version: step by step, more accurate info will be posted in the filesystem. The filesystem is intended to be used even with "creation time" and "modification time" of the files: that behavior will come.

tiian commented 1 year ago

Porting to FUSE 3 was not that hard, at least it seems. Commit 553958a2e7a7c5092ab992e5bd7c522b88090e19 contains a version that builds even with FUSE3; configure will choose automatically in case of both FUSE2 and FUSE3 are available, precedence will be for 3. No features added, it's still a demo version, but technically compatible with both FUSE 2 and FUSE 3.

damoclark commented 1 year ago

Hey @tiian

Will check this out on weekend and give it a go. Let you know how it goes.

tiian commented 1 year ago

Commit a2c01f39250c2779d6ffeec61c7dab30ff20a724 implements the basics of the lifecycle: connections are moved from "waitings" to "holders" dir. The only info available right now it "peer_name" the IP address / port of the client. Timestamps of the directories in the Virtual File System are updated according to file creation/movement.

tiian commented 10 months ago

Version 1.7.0 commit 3479d4b9415177b065008fb1884c8223f41f4b21 contains the basics of observability. Adding additional information is possible, but "add everything" would be a very time consuming task.

damoclark commented 8 months ago

Hi @tiian

Finally getting back to this now. But I need to deploy on 32bit arm systems, and have been unable to compile the 1.7.1 version as per #32.

Can I mix versions of flom between the clients and the server? In other words, can I deploy v1.6.1 on the 32bit arm hardware, and v1.7.1 on everything else? The flom server is running on x86_64, where I will be monitoring state information through the fuse interface.

D.

tiian commented 8 months ago

Hi @tiian

Finally getting back to this now. But I need to deploy on 32bit arm systems, and have been unable to compile the 1.7.1 version as per #32.

Can I mix versions of flom between the clients and the server? In other words, can I deploy v1.6.1 on the 32bit arm hardware, and v1.7.1 on everything else? The flom server is running on x86_64, where I will be monitoring state information through the fuse interface.

D.

Here the reply https://github.com/tiian/flom/issues/32#issuecomment-1940931858

tiian commented 8 months ago

I would consider it closed with release 1.7.2