Open joepie91 opened 9 years ago
setItem
already returns a promise, why can you use that? wait for each promise to resolve then call the next one.
Or you can use setItemSync()
, no wait, you wanted concurrency.
I'd be interested to see a test case when you have a chance (maybe paste a version it in JavaScript :) as some of us can't easily read coffee)
Waiting on the returned promise is not practical - the setItem
calls originate from different parts of the code (eg. an Express application processing many requests at the same time), and it's unreasonable to expect every part of the code to be aware of the state of every other part of the code. That defeats the entire concept of modularity, and this is why ensuring write safety should be a concern of the node-persist
module.
I'll post a testcase (in JS) later - I mostly just dumped my Coffeescript code here because that's what my own codebase is written in, and it's better than nothing.
@joepie91 no offense, but a major point of nodejs is for the code-execution to be asynchronous or non-blocking, since, by default-design, nodejs is single threaded. So, using the term "concurrent" is not very accurate in a single thread (unless you're talking clusters over a multi-core system)
However, I think you indirectly bring a good point, "node-persist" does not support "atomic transactions" - which is totally expected since node-persist is running in the same thread as the application using it, (where any other DB system have it's own background service, i.e. mysql, redis, postgres, mongo etc..)
That does not mean it's not possible, I think we can implement a node-service to do that or maybe just queue system (which would be less efficient), but possible.
Waiting on the returned promise is not practical
@joepie91 It sounds like you may want to build a queue system, and after adding to the queue, return a promise. Fulfill the promise when that item in the queue is processed. Of course, something like that could also be built into the core of node-persist, no?
In other words, the calls to setItem could be asynchronous, but the database writes could be synchronous.
Oh wait, a queue system is what @akhoury said. Oops.
no offense, but a major point of nodejs is for the code-execution to be asynchronous or non-blocking, since, by default-design, nodejs is single threaded. So, using the term "concurrent" is not very accurate in a single thread (unless you're talking clusters over a multi-core system)
I am referring to concurrent writes, which is a thing that Node.js absolutely does.
However, I think you indirectly bring a good point, "node-persist" does not support "atomic transactions" - which is totally expected since node-persist is running in the same thread as the application using it, (where any other DB system have it's own background service, i.e. mysql, redis, postgres, mongo etc..)
This is a separate concern - regardless of whether it supports atomic transactions, it should ensure that it does not corrupt the file (and currently, it corrupts the file on concurrent writes).
That does not mean it's not possible, I think we can implement a node-service to do that or maybe just queue system (which would be less efficient), but possible.
A batched queue would possibly be a solution; ie. on every set
, it schedules a batch write for the next tick (but not to execute before a possible previous write has completed). That way, you can still handle all set
s that occur within the same tick in a single write, which mostly resolves the efficiency issue.
(and currently, it corrupts the file on concurrent writes)
when it shouldn't, fair enough.
@joepie91 I like the batched-queue idea. I am currently on vacation for another month. if no ones beats me to it, I might take a stab at it when i get back.
As a general rule, files are hard to deal with.
This is frustrating! Using getItemSync is the dirty way which will block an entire application! I think the least that could be done is creating an internal queue inside node-persist for non-synced write operations that guarantees that write operations on a particular key are sequential. It will handle this situation internally and this way, calls from all contexts would be safe.
at this point, I am not so sure a queue is enough, we might need some job processing mechanism that also spawns child processes, writes a batch and then close them? but that seems like an overkill. Also, with batch writing means that if the app crashes, some write-requests may get lost.
Also, that doesn't solve the issue with different apps using the same storage dir.
The best solution is to run as a separate service with the same node api to talk to, but now we're implementing a database engine, which I really don't want to do.
Really node-persist is not designed for this, it's more like a localStorage of the browser, if you want a database, i think you should use a database.
But going back to the original issue filed, a smart queue might solve OP's issue ONLY.
@akhoury I don't think we need anything more than this issue. Everyone knows what node-persist is and I don't think anybody expects it to be write-safe regarding external writes. But being write-safe within the same app, given node's asynchronous nature, looks like a necessary feature. For my own simple API server, I was forced to use sync methods to prevent a disaster! In fact, when writing code I thought node-persist would handle this scenario internally! Then I accidentally found this issue while I was looking up something else and was forced to change some lines to sync (I mean I expected this issue to be handled as a basic internal feature).
I'm also having a problem with this. I chose node-persist for a small project, thinking it would be the fastest way to get up and running. Ran into a corrupted file while running tests.
If we're undecided on how to address it, perhaps we could at least add a warning to the docs?
Edit: It's important to stay on task here. All other concerns aside (eg. collisions with other apps), node-persist should be able to ensure that a basic application doesn't step on it's own toes when it calls setItem().
I wrote a solution FYI at #93
I've implemented serializing get and set operations per key. It is in typescript and requires node 10 because it uses asynchronous generators. But you can transcompile it to older nodes.
import storage from 'node-persist';
function createStorage<T>(name: string) {
interface ISetOperation {
type: 'set';
data: T;
}
interface IGetOperation {
type: 'get';
}
type Union = ISetOperation | IGetOperation;
const executor = createExecutor();
executor.next();
async function get() {
return (await executor.next({ type: 'get' })).value as T;
}
async function set(data: T) {
await executor.next({ type: 'set', data });
}
async function* createExecutor(): AsyncIterableIterator<T | undefined> {
let result;
for (;;) {
const item: Union = yield result;
switch (item.type) {
case 'get':
result = await storage.getItem(name);
break;
case 'set':
await storage.setItem(name, item.data);
result = undefined;
break;
}
}
}
return { get, set };
}
Example of use:
import { IHttpLog } from '~/types';
import { createStorage } from '~/storage';
const httpLogsStorage = createStorage<IHttpLog[]>('httpLogs');
export async function loadHttpLogs() {
return await httpLogsStorage.get();
}
export async function saveHttpLogs(logs: IHttpLog[]) {
await httpLogsStorage.set(logs);
}
Hitting the same issue. Maybe node-persist could make use of bottleneck
internally?
https://www.npmjs.com/package/bottleneck
Also encountered this issue. I was also using node-persist
as a sort of queuing mechanism, and I got a corrupted JSON file: {"key":"downstream.updates.zendesk-ticket-fields.hubspot-company-changed","value":{"##########":{"zip":"*****"}}}*****"}}}
Looks like a previous update had changed the address, a later update changed the zip, and node-persist
didn't fully replace the file.
I don't have time right now to create a PR but in my case I apparently fixed the issue by using Bottleneck
. All you have to do is to wrap the functions with a limiter. Something like this:
const limiter = new Bottleneck({
minTime: 1,
maxConcurrent: 1,
});
export async function createCache(cachePath: string) {
await storage.init({
dir: cachePath,
ttl: 1000 * 60 * 60 * 24, // 24hrs
});
}
export const put = limiter.wrap(
async (key: string, value: any, opts: CacheOptions = {}) => {
return await storage.setItem(key, value, opts as any);
}
);
export const get = limiter.wrap(async (key: string) => {
const value = await storage.getItem(key);
return value;
});
When doing heavy concurrent writes/persists that progressively decrease in size, files can sometimes become corrupted as newer data only partially overwrites older data (as there is no locking mechanism in place).
I can write a testcase later if needed, as I'm currently rather busy, but I figured I'd file an issue before I would forget.
I did write a very hacky monkeypatch (in Coffeescript) that at least ensures write queueing:
It's very inefficient - it doesn't attempt to 'deduplicate' queued writes and it always persists the last known version at the time of persisting, so under heavy concurrency it'll likely end up writing identical data many times... but at least it doesn't corrupt the files on disk :)
EDIT: For clarification, these monkeypatched methods I wrote are what caused me to discover the issue:
Rapidly calling
removeListItem
several times (eg. like would happen in a queue) would result in the aforementioned writes that progressively decrease in size.