Closed boutigny closed 8 years ago
Hi Dominique, I agree, I largely punted on the database by referring to the design docs; in retrospect, that's unfair.
@jbecla , could you add a paragraph on the database, along the lines that Dominique is proposing? The challenge: I'd need it by ~tomorrow night (need to submit by 5am on Friday).
I also recall you had a Supercomputing paper on qserv -- what's the reference for it?
We already cite the SC paper. I added that a while back.
Ah, missed that, thanks!
I'll look into that tomorrow
@jbecla Actually, having re-read the whole paper, I think the best thing to to would be to just add a note that Qserv has been tested in data challenges, just like the rest of the stack. The reason for that is that we don't actually dwell much on design details or performance promises for other components either.
How does this look: https://github.com/mjuric/adass-2015-paper/blob/qserv/O3-1.pdf . In the last paragraph on page 7, I added:
Advanced prototypes of the distributed, shared-nothing, database being written for LSST – Qserv – have been tested on a 150-node cluster using 55 billion rows and 30 terabytes of simulated data (Wang et al. 2011).
I took these from Daniel's paper -- what would be the updated numbers?
Great, I got distracted earlier and so I am glad to hear I don't need to write it :). The number you are quoting are capturing well the scale we tested with. Independently this summer we tested with relatively comparable data set (32TB, 5B objects, 35B sources, 172B forcedSources), the most prominent thing this time around was that we got Qserv to run reliably for days under heavy load (100+ concurrent queries consisting of a mix of easy and hard ones) and with 100x lower latency than the 2011 tests.
Thanks, Jacek -- I also added a link to S15 tests page on confluence.
While the LSST database is a major component of the DM I have the impression that it is only marginally mentioned in the paper and it is not clear that it is something much more sophisticated than what has been done before in the astronomy field. I would suggest to add a paragraph to describe the major components of the DB and to give a hint on the projected performances. Similarly, it may be good to mention the database tests which has been run in 2013 and in 2015 and the fact that they have demonstrated that the projected performances and scalability are achievable.