Open il3ven opened 1 year ago
It sounds like the fundamental issue you're describing is a race condition between threads. Both threads update the same value and the last write wins, which is not what you want in this case. Instead you would like the result to include both values.
You have proposed sqlite
as a solution. Would the idea be to model the owners array as a many-to-one relationship in which each track has many owners? If so, then this is solving the issue by changing the data model.
I'd like to share some alternatives, but have run out of time. Looking forward to more discussion on this later.
It sounds like the fundamental issue you're describing is a race condition between threads. Both threads update the same value and the last write wins, which is not what you want in this case. Instead you would like the result to include both values.
Exactly.
You have proposed sqlite as a solution. Would the idea be to model the owners array as a many-to-one relationship in which each track has many owners? If so, then this is solving the issue by changing the data model.
Yes, I do plan to implement a many-to-one relationship.
I'd like to share some alternatives, but have run out of time. Looking forward to more discussion on this later.
Alternatives are most welcome.
Alternatives are most welcome.
Nice. Thanks!
The most apparent alternative would be to stick with levelDB, but change the data model to avoid the race condition. In this case that would mean creating a new key for tracking owners. Something like some-key-referring-to-an-nft/owners/0xabc
, where the first part can be the key you are currently using and the last part is the owner address. The value at that key could be blank or include details like chain id and block number, which are likely represented in the key already. There may be trade-offs affecting usability on read that would need to be considered in more detail.
Also, is the key structure documented somewhere? Just now realizing I am not entirely up to speed on that.
Another option would be to consider the use of a CRDT with the desired semantics. I am not familiar enough with the use of CRDTs to suggest how it would apply in this case. https://crdt.tech/implementations
Also, is the key structure documented somewhere? Just now realizing I am not entirely up to speed on that.
It isn't documented but you can find it here. https://github.com/neume-network/crawler/blob/7a2a215e8b6c8f7fbc179734b0b098b8e8ac9b27/database/index.ts#L30
Something like some-key-referring-to-an-nft/owners/0xabc, where the first part can be the key you are currently using and the last part is the owner address. The value at that key could be blank or include details like chain id and block number, which are likely represented in the key already.
Yes, I have also thought about this and it is valid solution. However, we will have to write code to merge the owners on read. We can do it for now but if the schema changes in the future we will have to do a rewrite. Also, if we introduce new many-to-many or one-to-many relationships then we will write more custom code.
I will have a look at CRDT too but if sqlite doesn't impact our performance then we should use it instead of implementing everything ourselves. I believe the network calls will be the bottleneck while crawling and not our DB.
To add some spice to this conversation - I have been thinking about the possibility of going down the route of piggy-backing on the progress of Strapi and have Neume being a fork of their project (that we keep up to date by merging new releases / bug fixes).
The reason why this idea is intriguing for me is that we'd get a lot of functionality for free
neume crawl
/ neume daemon
etc) But there are some concerns / blockers with this approach;
owners
and transactions
. But I do think if we went down the route fo utilizing their one-to-many / many-to-many relationships, we could limit writes and therefore get better write performance.Would love your input / ideas regarding this @il3ven @neatonk
(This is probably more of a neume 3.0 discussion, but intrigued to hear what you think )
To add some spice to this conversation - I have been thinking about the possibility of going down the route of piggy-backing on the progress of Strapi and have Neume being a fork of their project (that we keep up to date by merging new releases / bug fixes).
Interesting idea and spicy, as advertised.
I'd argue against this for neume mostly because I think it would be detrimental to other use cases of neume that wouldn't need any of that. That said, I think it would be reasonable to structure neume as a library that can be embedded into other apps with minimal friction. Could be a good thought exercise to ask what would need to change about neume for that to be feasible.
Our current roadmap for neume is to support decent, lens and make the crawler more generic. The below are few technical changes which I propose for this roadmap.
Save Tracks instead of NFTs
Our schema currently represents an NFT. However, multiple NFTs can represent the song (track). This leads to duplication of data. The consumer of neume has to merge NFTs into tracks.
We stuck with NFTs because it was simpler and levelDB isn't suitable for tracks.
Pros of moving to Tracks
Problem with saving tracks in levelDB
LevelDB is a key-value database. Imagine we have the following track in our database.
owners
is the list of owners for this track.If two threads simultaneously update the
owners
field they will have to overwrite everything.Let's suppose thread 2 finishes last. We have the following value in our database.
Databases like MongoDB allow to insert values into a nested field but unfortunately levelDB doesn't. We can write code and add this functionality in levelDB but it won't be flexible. If we have another field like owner in the future we will have to write more code. Not ideal.
Using sqlite to solve the above LevelDB problem
I propose to give sqlite a try. To save effort we can use ORMs such as sequalize.
We dismissed sqlite before because it was pointed out that it has slow write speed. I argue that speed isn't our top priority and how slow can sqlite be.
Make strategies more generic
To be written...