neumino / rethinkdbdash

An advanced Node.js driver for RethinkDB with a connection pool, support for streams etc.
MIT License
848 stars 108 forks source link

Delete millions rows #348

Closed becerriljc closed 7 years ago

becerriljc commented 7 years ago

How can i delete millions of rows? im using secondary index but the data amount is very large 70GB ... thanks!

Extarys commented 7 years ago

If its everything r.table('name').delete(); should be pretty quick.

If not, get a seleciton before.

You can get a selection multiple ways:

.get('0000-0000-0000') by primary index .get_all by secondary index .filter() to filter out multiple value - slower

By secondary index:

r.table('tablename').get_all('Value to look forl', index='secondary_index_name').delete()

r.table('tablename').get_all(['Multiple','value','to','look','for']).delete()

RethinkDB: get_all and filter

Hope it helped a little

EDIT: Try running the command in the admin page so you can open another tab and check how mnany records per seconds are deleted to give you an ETA

becerriljc commented 7 years ago

thanks! I tried to do this, not working with large amount of data! Someone other idea? ... Iam using a betweent(date1, date2, {index:date})

Extarys commented 7 years ago

maybe .limit(0, 10000) ? This would select by chunck and may be easier on the CPU

BUT it should work even with large amount of data. Do you have some sort of error? What happen when you try it?

becerriljc commented 7 years ago

my query is: r.db('dbName').table('tableName').between(1503291600000,1503334799000, {index:'date'}).delete().run()

i try with: r.db('dbName').table('tableName').between(1503291600000,1503334799000, {index:'date'}).limit(100000).delete().run() but is very very slow, with 70GB...

Extarys commented 7 years ago

I guess it depends on the number of servers to handle the load and the io of the disks to actually delete those documents. If this is most of your DB maybe create a new database and transfer the stuff you want to keep and delette the whole thing - if onot then the only thing I can suggest is patience.

@neumino What do you think (sorry you are the only person I know here lol)

neumino commented 7 years ago

I think your query is fine. If you don't want to wait for the query to finish, you can use durability: false.

If you are dropping the whole table, tableDrop is faster

Extarys commented 7 years ago

Didn't think of durability, good to remember. Thanks for dropping by.