heavyai / heavydb

HeavyDB (formerly OmniSciDB)
https://heavy.ai
Apache License 2.0
2.96k stars 448 forks source link

Hangs on first query on "large" table #415

Open baretomas opened 5 years ago

baretomas commented 5 years ago

I've installed omnisci on an old gaming computer, for testing purposes, with: Intel(R) Xeon(R) CPU W3670 @ 3.20GHz Geforce GTX 1050 12 GB Ram 256 SDD Disk

The omnisci data folder is located on the SSD disk.

I have a table which data files is around 120 GB.

When I try to do any query on it, omnisci runs for at least over 1 hour, before I stopped it. Ive tried to restart the server, and retried the query with same results. If I try another query, or even just a \t on the database it will "hang", unless I do a server restart.

I checked iotop and top, and found that omni is loading from disk at 120 mb/s for awhile after I run the query against the tabl for, 10+ mins, before it suddenly drops to 20mb/s. top says omnisci_server is using around 23% of ram available around that time. I can see that most of the ram on the machine is dedicated to buff/cache, which is omnis doing as it does not have this before running the query.

So to my questions: is it feasible that omni can handle that table size, and larger, with the current hardware? If so, what do I have to do? Will these issue pass if I only use simple types like int, bigint, float etc, and avoid the timestamp(0) types and so on, or will the same problems occur once the table has grown to same filesizes?

...Is there any point in testing omni on this rig at all? Or should I just revert to postgres and live in cpu world for awhile longer?

randyzwitch commented 5 years ago

HI @baretomas -

Could you let me know your create_table statement (\d or \o from omnisql), how many rows are in the table, and the query you are running? This system should be fine for small tests; I use OmniSci all the time on my laptop, which has similar specs to what you have posted.

baretomas commented 5 years ago

Hey! Thanks for quick response!

Actually, \o makes it hang too. I can see from the logs its doing a query on the table then, so its probably that. But \d works, thankfully:

CREATE TABLE trade_log ( run_id INTEGER, cycle_id BIGINT, timestampInt BIGINT, key_pair TEXT ENCODING DICT(16), time_registered TIMESTAMP(0), amount DOUBLE, price DOUBLE, trade_type TEXT ENCODING DICT(16), balance DOUBLE, action TEXT ENCODING DICT(16), trade_id INTEGER, series_id INTEGER, config_id INTEGER, config_series_id INTEGER)

I dont know how many rows it is, as any attempt to query the table makes it run for hours. I can try overnight if necessary?

I can run any query against it, and its the same result: select cycle_id from trade_log limit 1;

Strange thing is it worked fine earlier today. I was running some queries from Squirrel SQL, and suddenly it stopped responding. I thought it was Squirrel SQL, so restarted the client. Couldnt log on then, just timed out. I then restarted the server and tried to logon with omnisql, which works fine, but does the same hoopla everytime I query this table. Other tables works fine.

alexbaden commented 5 years ago

Hi @baretomas

How did you insert data into the table? How many rows do you expect the table to be?

baretomas commented 5 years ago

Im using the SQLImporter from a java command line application. I load the jar and run the main method with a sqli.main(Commandline.translateCommandline(args)); The args I build runtime.

I expect it to grow by millions to 10s of millions per day. I would love to know what is "a lot" for my rig. I might decide to dump data sets onto the omnisci db, and do my calculations on parts of the data instead of the full set, and then discard. The size of the data would be up to 200-300 times less than full sets.