Open artyomxx opened 8 years ago
Hmmm... I mean, every time you call r.init
it will actually try to create the database and tables and then just handle the error if it already exists. But there should never ben any duplicate tables/databases since they're all supposed to be unique.
Can you post most details about this? Maybe post the results of running r.db('rethinkdb').table('table_config')
and r.db('rethinkdb').table('db_config')
to see the duplicate ones? Do you have multiple nodes running?
If this is actually a problem that's somehow caused by rethinkdb-init
, then a possible solution might be to first check the existence of a db/table before creating it but, the really database should not allow this to happen in the first place.
database should not allow this to happen in the first place
I thought so. But here it is. So assuming that I may not know something, I decided to first start an issue here, and then maybe go to rethinkdb issues.
Yes, my environment consists of at least three nodes, each connected through its own rethinkdb proxy
to the main rethinkdb server. Two of them use a shared database (for sessions). Also, sometimes my dev team launch their own nodes connecting to the same sever and using the same databases for debugging needs.
a possible solution might be to first check the existence of a db/table before creating
IMHO, this looks better from the engineering perspective. ;)
Can you post the results of r.db('rethinkdb').table('table_config')
and r.db('rethinkdb').table('db_config')
filtered by the table/db name to see the duplicate ones? Also, what happens when you pick a database that has two entries? Which one does it pick? Is it random?
IMHO, this looks better from the engineering perspective. ;)
The problem with that is the following, if it was able to create the database twice, who says it's going to know about that database in the first place? Would would dbList
/tableList
know about a database while dbCreate
/tableCreate
would not? Even if, for whatever reason, dbList
/tableList
had accurate information, you still have a race condition where a database/table could be inserted between the list query and the create query (even if it's in the same query!). Granted, this second point is still pretty minor, but something to consider.
I'm going to try to reproduce this and see if I can get it to happen too.
More question for trying to reproduce this:
Anything else that could help in reproducing this?
Can you post the results of r.db('rethinkdb').table('table_config') and r.db('rethinkdb').table('db_config') filtered by the table/db name to see the duplicate ones?
Sorry, the last time it happened I fixed it as soon as I found the problem (with renaming and then deleting clones from RDB web-console). Because it prevents new nodes to work properly - RethinkDB driver would not connect displaying messages like ambiguous table/database name ...
.
(dev os > node > rdb-proxy > ssh-tunnel (with port 28015) > rdb server)
. So, it works like a server's local node.The problem with that is the following, if it was able to create the database twice, who says it's going to know about that database in the first place?
Yeah, I understand.
Ok, a couple of more questions:
id
in rethinkdb.db_config
/rethinkdb.table_config
tables?Yes, all proxies run locally and connect to the RehinkDB server running on the same virtual machine. When someone of my dev team runs a local node, he uses a connection established through a ssh-tunnel (dev os > node > rdb-proxy > ssh-tunnel (with port 28015) > rdb server). So, it works like a server's local node.
- So this happens locally too? Even when you only have only 1 non-proxy RethinkDB node in the cluster? If so, what are you using for your virtual machine (Not sure it matters that much, but just checking)?
I think this might be enough to at least try to replicate it.
ambiguous table/db name
.rethinkdb proxy
process is connecting to the RethinkDB on the remote server. (I mistyped the port, 29015 - default cluster port, of course).
In the same time, there are a few of production-ready nodes running on this server, working with their own rethinkdb proxies
and using the same RethinkDB server which is local for them.Oh. Hope it's possible to understand all this. :)
But I guess that all this things with ssh-tunnels and local-node-remote-rdb are not important, because if I understand correctly, this happened a few times when no one was using this. So, there were two nodes using one shared database through their own rethinkdb proxy
connecting to the localhost:29015
.
Hey @lolwhoami, @tjmehta was able to repro this and get these two guys, so this is definitely a thing. Seems to be a RethinkDB error for sure though (TJ doesn't use this module), but I'd be interesting to get a good way to repro and see if anything can be done about it.
Well, if I understand correctly, this happens when rethinkdb proxy is started and node is connected, but the proxy is not yet fully connected to the server, so it doesn't know that some tables or databases exist. So could we ensure somehow if the proxy is fully connected? Maybe requesting the dbs/tables list will do the trick?
Just to let you know and maybe close the issue.
I've been running some tests with creation of dbs and tables through rethinkdb-proxy and found out that checking for db(...).config()
and table(...).status()
before creation of dbs and tables doesn't help at all. It all looks like at some point rethinkdb driver thinks that there are no such tables, so it's ok to create them. Also, sometimes status()
returns non-existing
errors, but the following tableCreate()
fails with already exists
error.
Just to mention, it's looks like it's ok with creation of databases - I couldn't make dbCreate()
to create a duplicate database, and it looks like db(...).config()
never fails to find an existing database. So, the problem is with tables only.
Also, r.db(...).wait()
and r.db(...).table(...).wait()
don't help too, they just resolve immediately before creating duplicate tables.
So, looks like the only solution here is to use some separate file or db to store information about actual db structure, just as they stated in the issue you've mentioned. But it's so weird! Oh...
I've built my own version of rethinkdb-init with db state saved in json file. It's here: https://github.com/lolwhoami/rdb-init
Hey, I've created my own handler for initiating new databases similar to this project for an application i'm working on. Unfortunately there isn't much of a way against the race conditions but I did write this little gem which disgustingly solves the problem:
_RemoveRethinkCollisions() {
this._Database.RethinkDB.db('rethinkdb').table('current_issues').filter({
type: 'table_name_collision'
}).run().then(Issues => {
if (!Issues.length)
return Logger.Info('No issues found');
const TablesToRemove = Issues.map(Issue => Issue.info.ids.slice(1)).reduce((Array, Tables) => Array.concat(Tables), []);
this._Database.RethinkDB.db('rethinkdb').table('table_config').getAll(...TablesToRemove).delete().run().then(_ => {
console.log(_);
})
})
}
This might not be appropriate to your use case and this issue is over 7 months old but I wanted to provide some insight in case it is helpful.
Cheers.
Sometimes something strange happens and I find that there are duplicate tables or even databases exist in our RethinkDB server. For some reason I think this could happen when connection to RDB is interrupted due to RDB restart or some other reason. May your module try to create a database or a table in this situation?