More details on the database ?

mjuric / adass-2015-paper

Source for the ADASS 2015 (Sydney) paper to accompany the invited talk

0 stars 10 forks source link

More details on the database ? #23

Closed boutigny closed 8 years ago

boutigny commented 8 years ago

While the LSST database is a major component of the DM I have the impression that it is only marginally mentioned in the paper and it is not clear that it is something much more sophisticated than what has been done before in the astronomy field. I would suggest to add a paragraph to describe the major components of the DB and to give a hint on the projected performances. Similarly, it may be good to mention the database tests which has been run in 2013 and in 2015 and the fact that they have demonstrated that the projected performances and scalability are achievable.

mjuric commented 8 years ago

Hi Dominique, I agree, I largely punted on the database by referring to the design docs; in retrospect, that's unfair.

@jbecla , could you add a paragraph on the database, along the lines that Dominique is proposing? The challenge: I'd need it by ~tomorrow night (need to submit by 5am on Friday).

I also recall you had a Supercomputing paper on qserv -- what's the reference for it?

timj commented 8 years ago

We already cite the SC paper. I added that a while back.

mjuric commented 8 years ago

Ah, missed that, thanks!

jbecla commented 8 years ago

I'll look into that tomorrow

mjuric commented 8 years ago

@jbecla Actually, having re-read the whole paper, I think the best thing to to would be to just add a note that Qserv has been tested in data challenges, just like the rest of the stack. The reason for that is that we don't actually dwell much on design details or performance promises for other components either.

How does this look: https://github.com/mjuric/adass-2015-paper/blob/qserv/O3-1.pdf . In the last paragraph on page 7, I added:

Advanced prototypes of the distributed, shared-nothing, database being written for LSST – Qserv – have been tested on a 150-node cluster using 55 billion rows and 30 terabytes of simulated data (Wang et al. 2011).

I took these from Daniel's paper -- what would be the updated numbers?

jbecla commented 8 years ago

Great, I got distracted earlier and so I am glad to hear I don't need to write it :). The number you are quoting are capturing well the scale we tested with. Independently this summer we tested with relatively comparable data set (32TB, 5B objects, 35B sources, 172B forcedSources), the most prominent thing this time around was that we got Qserv to run reliably for days under heavy load (100+ concurrent queries consisting of a mix of easy and hard ones) and with 100x lower latency than the 2011 tests.

mjuric commented 8 years ago

Thanks, Jacek -- I also added a link to S15 tests page on confluence.