neurodata / ndpaper

NeuroData Paper Landing page
http://ndpaper.neurodata.io
Apache License 2.0
0 stars 3 forks source link

ingest speeds #44

Closed jovo closed 8 years ago

jovo commented 8 years ago

@kunallillaney eric says davi's microscope goes 200MBytes/s, bobby says jeff's new microscope goes 10 tb/hr = 2,700 mb / s note that bobby isn't sure whether it is bits or bytes.

these are good references for us to know, when generating the ingest benchmark.

@MrAE

MrAE commented 8 years ago

If it's terabits (Tb), then that would be > 300 MB/s If it's terabytes (TB), then that would be > 2.5 GB/s

jovo commented 8 years ago

i'm glad somebody can add. i've always sucked at arthimatic!

MrAE commented 8 years ago

Don't look at me, I used WolframAlpha for that! Haha

jovo commented 8 years ago

i used R and still got it wrong :)

On Wed, Jun 1, 2016 at 8:39 PM, Jesse Leigh Patsolic < notifications@github.com> wrote:

Don't look at me, I used WolframAlpha for that! Haha

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/ndpaper/issues/44#issuecomment-223086028, or mute the thread https://github.com/notifications/unsubscribe/AACjcgxYqSiIIZcVG8VEwuZuloFzefQQks5qHdHrgaJpZM4Ir0H7 .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

kunallillaney commented 8 years ago

@jovo We can do 20MB/sec ingest to a MySQL database. This includes transferring data from S3 scratch space, slicing it into smaller cuboids and inserting it into MySQL. Note: that S3 is infinite so we can buffer and this is single threaded. We do not have a multi-threaded version of this and that will take about 2-4 weeks.

jovo commented 8 years ago

great! send me & @mrae the csv file when you get it timing ingest. by the time we get the revisions back, we'll replace it with the new parallel fig :)

On Fri, Jun 3, 2016 at 5:40 AM, Kunal Lillaney notifications@github.com wrote:

@jovo https://github.com/jovo We can do 20MB/sec ingest to a MySQL database. This includes transferring data from S3 scratch space, slicing it into smaller cuboids and inserting it into MySQL. Note: that S3 is infinite so we can buffer and this is single threaded. We do not have a multi-threaded version of this and that will take about 2-4 weeks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neurodata/ndpaper/issues/44#issuecomment-223483544, or mute the thread https://github.com/notifications/unsubscribe/AACjcupJwE3EQqom_-Gzu2aWSZUx0F4uks5qH6IRgaJpZM4Ir0H7 .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

kunallillaney commented 8 years ago

@jovo There is no csv file. It is only a single value of time. I divided the total data by time and got that value.

jovo commented 8 years ago

ok, what was the total data size? can you do that for 4 other datasets with different data sizes, and then give jesse all 5 (name,size,time) triples?

On Fri, Jun 3, 2016 at 8:21 AM, Kunal Lillaney notifications@github.com wrote:

@jovo https://github.com/jovo There is no csv file. It is only a single value of time. I divided the total data by time and got that value.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neurodata/ndpaper/issues/44#issuecomment-223500174, or mute the thread https://github.com/notifications/unsubscribe/AACjcos-BIcalZ7rWAgfIYx838RpklfSks5qH8fTgaJpZM4Ir0H7 .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

jovo commented 8 years ago

@MrAE i think kunal made the new csv files, can you make the new figure?

MrAE commented 8 years ago

I'll let you know when I have something.

wrgr commented 8 years ago

@kunallillaney - please add a md description so that we can better grok what's happening?

wrgr commented 8 years ago

@kunallillaney @MrAE

This is one of the only major outstanding issues for next Monday for the paper

kunallillaney commented 8 years ago

I left him instructions here. He can ask me specific questions if something is missing.

MrAE commented 8 years ago

Thanks for the script/instructions. Now, can you please provide a description of the data, like @willgray mentioned? Thanks!

kunallillaney commented 8 years ago

@MrAE The data is Size(in GB) + 5 iterations of time(in seconds). We bench-marked ingest time over 3 different input data sizes(2.7,12.,115) for 5 iterations. You should probably generate a graph of MB/sec on Y-axis over data sizes on X-axis.

MrAE commented 8 years ago

The figure should be updated everywhere. 👇 Feedback welcome. @jovo @willgray @kunallillaney

wrgr commented 8 years ago

Woot! Thanks guys!! Comments on this go in another issue if needed. This is complete.