data to bring - Githubissues

alexbaden commented 7 years ago

[ ] kasthuri11 (res1 -> n)
[ ] kharris15*
[ ] Fear199 / Cocaine178 (res1 -> n)
[ ] kristina15
[ ] collman14, collman15

I believe we can also setup the NUCs to connect to the data in S3 and prefetch a bunch of data into the local MySQL DB. Also, fetching a super cuboid over 3mbps will be painful, but not impossible. So any of the cloudified data we just need to setup as a project and test. Is this true @kunallillaney?

@jovo @perlman @gkiar

kunallillaney commented 7 years ago

@alexbaden The data that currently exists in the cloud is

kasthuri11 - res2 to res7(that is all that I can fit on my machine for now. But this will be altered now with ssh tunneling.)
res0 for at least 5 CLARITY brains (I do not plan to upload the remaining res but build it using the lambda propagate service once it is functional). Another reason here is that there have been complaints form Kwame that some of the res levels are broken/missing data. Thus my reluctance to upload other res data from the DSP's.

There are some issues with what you stated:

Yes you can prefetch data into Redis and MySQL from S3. We wrote the MySQL support for prototype purposes and is not recommended. Redis is the key-value store we currently support for caching from the cloud. Plus, the cache indexing for MySQL is also in MySQL which is quite slow compared to indexing in Redis. Lastly, we have a cache manager which works for Redis not MySQL, so if you run out of disk-space that will lead to issues.
To summarize, if you want to use the S3 model, I would use Redis locally and not MySQL. Redis is also used by all the cloud instances which currently deploy the microns branch.
An issue which I foresee with NUC's and Redis is that the memory on the NUC's is about 16-32GB(I might be wrong here) which is not enough of a cache for Redis(it usually should be around 100GB). Now, this ideally will not be a problem because in a world with high-speed internet the cache manager will evict stale data and fetch new supercuboids. At 3mbps speeds this is going to be painful since each supercuboid is about 4-10MB big. This means there will be a lag of 2-3 seconds if not more for a supercuboid miss which will happen more frequently with low memory. In short, Low available memory + low internet speeds + Redis will not make the users happy.
Since it is only res0 data in S3 for now, you can still use zoomOut capability to visualize these datasets but that will slower compared to reading at that res level.

jovo commented 7 years ago

great, thanks kunal, let's discuss at 1pm tomorrow.

On Fri, Oct 28, 2016 at 9:49 AM, Kunal Lillaney notifications@github.com wrote:

@alexbaden https://github.com/alexbaden The data that currently exists in the cloud is

kasthuri11 - res2 to res7(that is all that I can fit on my machine for now. But this will be altered now with ssh tunneling.)

res0 for at least 5 CLARITY brains (I do not plan to upload the remaining res but build it using the lambda propagate service once it is functional). Another reason here is that there have been complaints form Kwame that some of the res levels are broken/missing data. Thus my reluctance to upload other res data from the DSP's.

There are some issues with what you stated:

Yes you can prefetch data into Redis and MySQL from S3. We wrote the MySQL support for prototype purposes and is not recommended. Redis is the key-value store we currently support for caching from the cloud. Plus, the cache indexing for MySQL is also in MySQL which is quite slow compared to indexing in Redis. Lastly, we have a cache manager which works for Redis not MySQL, so if you run out of disk-space that will lead to issues.

To summarize, if you want to use the S3 model, I would use Redis locally and not MySQL. Redis is also used by all the cloud instances which currently deploy the microns branch.

An issue which I foresee with NUC's and Redis is that the memory on the NUC's is about 16-32GB(I might be wrong here) which is not enough of a cache to for Redis(it usually should be around 100GB). Now, this ideally will not be a problem because in a world with high-speed internet the cache manager will evict stale data and fetch new supercuboids. At 3mbps speeds this is going to be painful since each supercuboid is about 4-10MB big. This means there will be a lag of 2-3 seconds if not more for a supercuboid miss which will happen more frequently with low memory. In short, Low available memory + low internet speeds + Redis will not make the users happy.

Since it is only res0 data in S3 for now, you can still use zoomOut capability to visualize these datasets but that will slower compared to reading at that res level.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neurodata/sfn2016/issues/5#issuecomment-256970841, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjckbO7mzTdYNWKZJX8rtbU1lEp_I9ks5q4iepgaJpZM4KjmUT .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

neurodata / sfn2016

data to bring #5