workshopper / levelmeup

Level Me Up Scotty! An intro to Node.js databases via a set of self-guided workshops.
Other
271 stars 61 forks source link

Exercises #1

Closed rvagg closed 11 years ago

rvagg commented 11 years ago

Including a bunch of you in this private repo as you're either involved in LevelUP, NodeConf.eu or might have some useful input to this tool: @raynos @dominictarr @maxogden @ralphtheninja @kesla @juliangruber @hij1nx @No9 @mcollina @pgte @mikeal @substack @brycebaril (have I missed anyone that might have good input?)

Please keep this private for now, it's intended for use initially at NodeConf.eu, beyond that you can go wild. It ought to be a good teaching tool for any workshoppy situation or just a good tool to point newbies at. I'm working under the assumption that nobody else is preparing for the NodeBase workshop section at NodeConf.eu, I was told that @hij1nx suggested a workshop but I don't believe anyone else has anything planned thus far (correct me if I'm wrong!).

A descendent of the great stream-adventure from NodeConf by @maxogden and @substack, this now builds on workshopper which is an evolution of the guts of stream-adventure, designed for use with learnyounode so while it can still deal with the original stream-adventure exercises it can do a bit more too, and there's more functionality being added to workshopper for this project too.

Exercises can be similar to those found in stream-adventure, here or learnyounode, here. The easiest exercises are stand-alone programs that print stuff to stdout, or can be made to print to stdout indirectly as this is how stream-adventure was designed. It's also possible to have a validation step after a solution is run to verify criteria that can't be munged nicely into a stdout print. Don't be constrained by what's currently possible when coming up with exercises though.

Initially what I need help with is an outline of the exercises, structured for the most optimal learning experience around foundational "NodeBase" concepts. So, even if you can't contribute code or documentation or anything else, a clear eye over the lesson progression would be greatly appreciated.

So far, this is what I have in mind:

Basics, really simple exercises just to cover the very basic operations in LevelUP:

  1. Basics: GET
  2. Basics: PUT
  3. Basics: BATCH
  4. Basics: READSTREAM

The first 2 are implemented, I'm considering how best to do the third without it just being a duplicate of the second (there are a couple of exercises in learnyounode that deal with sync vs async methods and it monkeypatches fs to watch the methods that the solution uses and fails you if you use the wrong one(s), I'm not sure how suitable that approach is here tho). ReadStream will just be a "here's a db with a bunch of random keys & values, print them in order to stdout" I think.

After that, I think the next step is to deal with range queries, so perhaps a couple of exercises that deal with existing dbs and structured keys that asks you to print out just a subset of the keys according to some criteria to be supplied as command line arguments and you have to create a ReadStream that'll pick those up. I'm also wondering if an exercise on bytewise would be good here. Some creative thinking required!

Then we can have some more advanced stuff, somehow pulling in sublevel, maybe even hooks and I also thought that coming up with something that uses multilevel would be fun. But I need some help thinking up exercises that lend themselves to being tested/verified.

mcollina commented 11 years ago

@rvagg as with the issue of 3 I think you want to include DEL there and then show how batches can be PUT or DEL.

Depending on time, I'll leave bytewise out as its HARD to debug. However :+1: for sublevel, hooks and multilevel.

mikeal commented 11 years ago

I've been messing with a lot of bytewise and not bytewise stuff lately. Half of what i tend to do with bytewise is also accomplished with sublevel and when using sublevel it is also browser compatible.

What I think is more generally useful/applicable is level-mutex. A lot of databases need read-before-write semantics to guarantee consistency so that might be a better tutorial than bytewise. The main thing bytewise makes easy is building indexes but I feel like secondary indexing is more of an advanced topic and you definitely need read-before-write semantics to write a consistent secondary index.

rvagg commented 11 years ago

added @luk-, sorry for the oversight but I imagine you'd have some good insight too.

heapwolf commented 11 years ago

This is great! I really enjoyed the streams-adventure workshop.

Ideas for structuring the workshop: In the past, I've had good results by pairing students. Specifically, for the first 10 minutes of the workshop i'd have students talk about what they know already. People appreciate it when you establish some individual rapport with them. After that, I'd take more experienced students and pair them with a less experienced students. This keeps the smart ones busy and makes the instructor more available. For the introverted, it promotes conversation. Near the end of the class, I'd have people who solved an interesting problem discuss how they solved it. People LOVE getting a minute in the spotlight. Anyway, the whole reason we physically attend events is to have some interesting social interactions; so I think some of this is important.

luk- commented 11 years ago

@rvagg I think it would be really helpful for people if you spend some time on how to model the data. I don't necessarily mean even formal models, which might actually confuse people. Taking a set of structured data that a lot of people are used to seeing represented with a schema in SQL, and demonstrating how it can be stored while taking advantage of sublevel would be an awesome exercise. Taking it a step further, storing a large amount of data using sublevel, then using range queries to demonstrate how quickly you can retrieve sets would be an awesome coup de grace high dive finale or whatever you guys call it in the Outback.

ralphtheninja commented 11 years ago

Different strategies how to create your keys to structure your data and what separators you should use and not use.

mcollina commented 11 years ago

@ralphtheninja definitely +1 for separators. It took me a while to figure that out.

dominictarr commented 11 years ago

@luk- good point. this is probably the most rewarding bit. Stuff like understanding atomicness and durability is important but more scary.

So, I always try to pace educational talks like @substack's 2012 lxjs talk "harnessing the power of streams"

  1. Start with a simple easy to digest example that clearly works best with your model (parse a JSON too big to parse sync)
  2. expand on this, showing how easy it is to work with (loud-stream that uppercases input)
  3. now, you can show the first hard bit (race conditions, and why you need to use pausing) this is where you could put separators.
  4. then, show them something cool again, as a reward for getting past the hard scary bit. once they get to here, you have quite a "wow" effect but people would not have been able to understand this bit without the lead up. (in substack's talk, this bit is scuttlebutt)

Of course, this is not a talk, but the same pacing style would be applicable I think.

rvagg commented 11 years ago

So far I have:

That's enough for introducing the concept of range queries I think, I need to move on to key structure and hierarchies. If anyone has good idea for exercises in this area then please let me know!

I'm told the workshops should go for ~1h, I reckon it would only take one or two more exercises to fill up that time for complete newbies of varying skill level. But for people with some experience with sorted key/value stores it's going to take more, plus I'd really like to introduce sublevel, multilevel and some other key modules if possible; this needs to be something they can take away, incomplete, and continue in their own time.

Any help would be appreciated, I'm finding that 1/2 of the work here is coming up with the ideas for the exercises in the first place so I'd really like ideas.

timoxley commented 11 years ago

The point being to get used to NotFound errors but I've had feedback that the open-ended nature of the problem is a bit of a concern.

The concern I had was that you're not very likely to implement a search like that in real life (probably?), and the lack of an upper bound just makes someone who hasn't seen the code that's generating the data unnecessarily anxious about things like Number.MAX_VALUE.

My suggestion would be to perhaps do something like "there are 10 values set between gibberish0 and gibberish256, find the values that are set". That would make the point about NotFound just as clearly, but without making me nervous.

rvagg commented 11 years ago

"there are 10 values set between gibberish0 and gibberish256, find the values that are set"

I like it! Thanks.

eugeneware commented 11 years ago

Yeah. I found the GET example a bit confusing in the wording too. It wasn't entirely clear in the wording (or maybe it's my lack of sleep) - that the numbers were actually consecutive.

Though, searching for 10 random values between gibberish0 and gibberish256, a brute force search aint too efficient either. I guess you can make a note in the problem.txt file that you'll learn a more efficient way to do this via a streaming range query in the next problem...

Perhaps just a simpler exercise upfront where you pass them a key, or a list of keys from the command line and they have to return the values for it. That way they can focus on one simple concept first (retrieval), which is a very common use case, before they have to tackle the idea of dealing with "not found" errors and iterating over a range.

rvagg commented 11 years ago

@eugeneware: I had the same suggestion from @substack and @dominictarr, start even simpler and just get the basics of the environment set up to do a single operation. So I'll add that now as exercise #2.

rvagg commented 11 years ago

I've added an initial getting-started-with-level exercise that's just a get and print, it's in spot 2

eugeneware commented 11 years ago

Here are my thoughts for some more exercises that can build on each other to build a more complex app, that will use sublevel and bytewise and possibly multilevel.

Basically teach people how to build a simple twitter app.

I've got some example code that does some of this in my level-microblog repo that I was messing around with. Feel free to steal/lift code from there.

Have a sample database that has the following sublevels:

Sublevel Basic Exercise

Bytewise Basic Exercise - Generate a twitter feed for a tweet

{ key: ['rvagg', 1234], value: { id: 1234, handle: 'eugeneware', message: 'hey' }
{ key: ['dominictarr', 1234], value: { id: 1234, handle: 'eugeneware', message: 'hey' }

Get a stream of the latest tweets for user

In order to get bytewise and sublevel playing better together we need to use a yet unreleased version of bytewise which has the hex encoding builtin for strings, so that it works with the current string-only version of sublevel.

I've published a simple module called bytewise-hex that does returns the encodings to levelup as hex strings (copied straight out of byte wise). Would be OK for now.

Connect it to the web

Connect it to multilevel

Extra Credit - Sharding and hash-rings

Anyway, that's my $0.02! Hope it helps!

rvagg commented 11 years ago

this is all good, I'll try and use at least some of it, unfortunately I'm starting to run out of time to get it all done!

one of the trickiest things is coming up with exercises that are small enough to fit the problem statement + hints into a reasonable number of lines yet be challenging enough to not be boring, I'm finding that if I have to write too much in problem.txt then it needs to be split into multiple problems, or discarded all together.

eugeneware commented 11 years ago

FYI @dominictarr got the latest bytewise library pushed out which has built-in hex encoding.

So you can now do:

var bytewise = require('bytewise/hex');
var sublevel = require('level-sublevel');
var db = sublevel(level('/tmp/db', { keyEncoding: bytewise }));

Ie. sublevels + bytewise work together nicely.