Does istatd flush data to disk when it's stopped?

rca commented 11 years ago

Hello,

It looks like istatd does not flush in-memory data before shutting down. It would be great if it did not lose this data.

I noticed this while tinkering with istatd and a small relay client that will persist metrics locally if the istatd server is not accessible. To make sure that it does the right thing when istatd is offline, I stop istatd with service istatd stop. Upon starting istatd back up I noticed there would be a gap of missing data, but it was data that istatd had already received.

The images below illustrate the issue with the following steps:

Stop istatd and wait about 10 seconds before starting it again.
Note the gap of missing data.
Let the system run for a few minutes.
Stop istatd and start it up after a few seconds.
Start istatd and notice data that was in the previous graph is now gone.

screen shot 2013-05-27 at 10 20 44 pm

screen shot 2013-05-27 at 10 23 25 pm

I can semi-confirm the desired behavior by connecting to the admin port and running the flush command manually, then stopping the service. Ideally, upon shutdown, istatd would disconnect any clients to stop receiving data, then flush to disk, and finally quit.

Thanks!

jwatte commented 11 years ago

Thanks for your suggestion!

Flushing is an intersting question.

istatd will rely on the kernel to flush data that's been already committed to file. However, data that is still in RAM is still being collated -- we don't know whether all providers of data have gotten the data in or not. Flushing all the counters then would meant hat we flush data that possibly not all providers have contributed to, so we might flush false (or at least inaccurate) data.

When we couple this with the significant additional time needed to flush all pending data in a real production environment (500,000 counter files,) we choose to not do anything extra when trying to stop the daemon by default. Then again, we don't really stop the daemon in production.

As you suggest, if you want this flushing behavior, you can emit the "flush" command on the admin port, which will flush the entire store of collected data. Perhaps you can modify the stop scripts to have this behavior?

rca commented 11 years ago

Thanks for the reply; I have a couple of questions.

It sounds like calling the flush command on the admin port would introduce the same false / inaccurate data, is this correct?

I agree there is no need to regularly stop the daemon in production though how do you go about performing maintenance tasks that require a reboot such as a kernel upgrade? Do you simply lose any in-memory data?

My motivation is to not lose data that has been accepted by the daemon. Running the flush command "out of band" still leaves a window between flush and shutdown where data could be lost. I understand wrong data may be worse than no data at all, but I'm not familiar with istatd's internals to understand how the inconsistencies are generated.

Thank you!

jwatte commented 11 years ago

When we need to update the kernel, we lose a minute or two of data. We do this very seldom :-)

The right way to fix this is to support master/slave replication, which we have designed, but not had the opportunity to implement. It should be about a week of work for someone who knows C++ well, so let me know if you have that available!

Sincerely,

Jon Watte

"I pledge allegiance to the flag of the United States of America, and to the republic for which it stands, one nation indivisible, with liberty and justice for all." ~ Adopted by U.S. Congress, June 22, 1942

On Wed, Jun 26, 2013 at 5:44 PM, Roberto Aguilar notifications@github.comwrote:

Thanks for the reply; I have a couple of questions.

It sounds like calling the flush command on the admin port would introduce the same false / inaccurate data, is this correct?

I agree there is no need to regularly stop the daemon in production though how do you go about performing maintenance tasks that require a reboot such as a kernel upgrade? Do you simply lose any in-memory data?

My motivation is to not lose data that has been accepted by the daemon. Running the flush command "out of band" still leaves a window between flush and shutdown where data could be lost. I understand wrong data may be worse than no data at all, but I'm not familiar with istatd's internals to understand how the inconsistencies are generated.

Thank you!

— Reply to this email directly or view it on GitHubhttps://github.com/imvu-open/istatd/issues/10#issuecomment-20090495 .

imvu-open / istatd

Does istatd flush data to disk when it's stopped? #10