Exercises - Githubissues

rvagg commented 11 years ago

Including a bunch of you in this private repo as you're either involved in LevelUP, NodeConf.eu or might have some useful input to this tool: @raynos @dominictarr @maxogden @ralphtheninja @kesla @juliangruber @hij1nx @No9 @mcollina @pgte @mikeal @substack @brycebaril (have I missed anyone that might have good input?)

Please keep this private for now, it's intended for use initially at NodeConf.eu, beyond that you can go wild. It ought to be a good teaching tool for any workshoppy situation or just a good tool to point newbies at. I'm working under the assumption that nobody else is preparing for the NodeBase workshop section at NodeConf.eu, I was told that @hij1nx suggested a workshop but I don't believe anyone else has anything planned thus far (correct me if I'm wrong!).

A descendent of the great stream-adventure from NodeConf by @maxogden and @substack, this now builds on workshopper which is an evolution of the guts of stream-adventure, designed for use with learnyounode so while it can still deal with the original stream-adventure exercises it can do a bit more too, and there's more functionality being added to workshopper for this project too.

Exercises can be similar to those found in stream-adventure, here or learnyounode, here. The easiest exercises are stand-alone programs that print stuff to stdout, or can be made to print to stdout indirectly as this is how stream-adventure was designed. It's also possible to have a validation step after a solution is run to verify criteria that can't be munged nicely into a stdout print. Don't be constrained by what's currently possible when coming up with exercises though.

Initially what I need help with is an outline of the exercises, structured for the most optimal learning experience around foundational "NodeBase" concepts. So, even if you can't contribute code or documentation or anything else, a clear eye over the lesson progression would be greatly appreciated.

So far, this is what I have in mind:

Basics, really simple exercises just to cover the very basic operations in LevelUP:

Basics: GET
Basics: PUT
Basics: BATCH
Basics: READSTREAM

The first 2 are implemented, I'm considering how best to do the third without it just being a duplicate of the second (there are a couple of exercises in learnyounode that deal with sync vs async methods and it monkeypatches fs to watch the methods that the solution uses and fails you if you use the wrong one(s), I'm not sure how suitable that approach is here tho). ReadStream will just be a "here's a db with a bunch of random keys & values, print them in order to stdout" I think.

After that, I think the next step is to deal with range queries, so perhaps a couple of exercises that deal with existing dbs and structured keys that asks you to print out just a subset of the keys according to some criteria to be supplied as command line arguments and you have to create a ReadStream that'll pick those up. I'm also wondering if an exercise on bytewise would be good here. Some creative thinking required!

Then we can have some more advanced stuff, somehow pulling in sublevel, maybe even hooks and I also thought that coming up with something that uses multilevel would be fun. But I need some help thinking up exercises that lend themselves to being tested/verified.

mcollina commented 11 years ago

@rvagg as with the issue of 3 I think you want to include DEL there and then show how batches can be PUT or DEL.

Depending on time, I'll leave bytewise out as its HARD to debug. However :+1: for sublevel, hooks and multilevel.

mikeal commented 11 years ago

I've been messing with a lot of bytewise and not bytewise stuff lately. Half of what i tend to do with bytewise is also accomplished with sublevel and when using sublevel it is also browser compatible.

What I think is more generally useful/applicable is level-mutex. A lot of databases need read-before-write semantics to guarantee consistency so that might be a better tutorial than bytewise. The main thing bytewise makes easy is building indexes but I feel like secondary indexing is more of an advanced topic and you definitely need read-before-write semantics to write a consistent secondary index.

rvagg commented 11 years ago

added @luk-, sorry for the oversight but I imagine you'd have some good insight too.

heapwolf commented 11 years ago

This is great! I really enjoyed the streams-adventure workshop.

Ideas for structuring the workshop: In the past, I've had good results by pairing students. Specifically, for the first 10 minutes of the workshop i'd have students talk about what they know already. People appreciate it when you establish some individual rapport with them. After that, I'd take more experienced students and pair them with a less experienced students. This keeps the smart ones busy and makes the instructor more available. For the introverted, it promotes conversation. Near the end of the class, I'd have people who solved an interesting problem discuss how they solved it. People LOVE getting a minute in the spotlight. Anyway, the whole reason we physically attend events is to have some interesting social interactions; so I think some of this is important.

luk- commented 11 years ago

@rvagg I think it would be really helpful for people if you spend some time on how to model the data. I don't necessarily mean even formal models, which might actually confuse people. Taking a set of structured data that a lot of people are used to seeing represented with a schema in SQL, and demonstrating how it can be stored while taking advantage of sublevel would be an awesome exercise. Taking it a step further, storing a large amount of data using sublevel, then using range queries to demonstrate how quickly you can retrieve sets would be an awesome coup de grace high dive finale or whatever you guys call it in the Outback.

ralphtheninja commented 11 years ago

Different strategies how to create your keys to structure your data and what separators you should use and not use.

mcollina commented 11 years ago

@ralphtheninja definitely +1 for separators. It took me a while to figure that out.

dominictarr commented 11 years ago

@luk- good point. this is probably the most rewarding bit. Stuff like understanding atomicness and durability is important but more scary.

So, I always try to pace educational talks like @substack's 2012 lxjs talk "harnessing the power of streams"

Start with a simple easy to digest example that clearly works best with your model (parse a JSON too big to parse sync)
expand on this, showing how easy it is to work with (loud-stream that uppercases input)
now, you can show the first hard bit (race conditions, and why you need to use pausing) this is where you could put separators.
then, show them something cool again, as a reward for getting past the hard scary bit. once they get to here, you have quite a "wow" effect but people would not have been able to understand this bit without the lead up. (in substack's talk, this bit is scuttlebutt)

Of course, this is not a talk, but the same pacing style would be applicable I think.

rvagg commented 11 years ago

So far I have:

ALL YOUR BASE: a simple "hello world" with nothing to do with databases, as per @dominictarr's suggestion to have an introductory exercise to get them used to the format.
Basics: GET: practice with db.get(), the problem statement tells you that you have an open-ended number of entries of the form "gibberishX" wher X starts at 0 and ends at some number >0 but you don't know where. The point being to get used to NotFound errors but I've had feedback that the open-ended nature of the problem is a bit of a concern.
Basics: PUT: given a JSON object on the commandline, parse it and put all the key/value pairs into the database using db.put()
Basics: BATCH: read a file that contains a series of simple CSV (that you can .split(',')) that are prefixed with either 'PUT' or 'DEL', given an existing db, perform a .batch() with these values. I also think I'll make it fail you if you don't use .batch() for this one (by monkeypatching level).
Streaming: you don't know what the keys are but you need to print the whole db to the console, so you have to use a ReadStream.
@horse_js_counts: an introduction to range queries, you have to use a 'start' option to make this one work, the keys are all ISO format dates and you need to start at a particular month, so there's a partial-match thing here too. SImply involves performing a count of the number of entries matched. I need to finish this by keeping track of how many entries the solution processed from the db to make sure they actually used a 'start' correctly.
@horse_js_tweets: continues the previous exercise but you now need to use a 'end' option to bound your query. I'm also going to complete this by verifying that they didn't fetch more entries than they needed, so they must use 'end' properly.

That's enough for introducing the concept of range queries I think, I need to move on to key structure and hierarchies. If anyone has good idea for exercises in this area then please let me know!

I'm told the workshops should go for ~1h, I reckon it would only take one or two more exercises to fill up that time for complete newbies of varying skill level. But for people with some experience with sorted key/value stores it's going to take more, plus I'd really like to introduce sublevel, multilevel and some other key modules if possible; this needs to be something they can take away, incomplete, and continue in their own time.

Any help would be appreciated, I'm finding that 1/2 of the work here is coming up with the ideas for the exercises in the first place so I'd really like ideas.

timoxley commented 11 years ago

The point being to get used to NotFound errors but I've had feedback that the open-ended nature of the problem is a bit of a concern.

The concern I had was that you're not very likely to implement a search like that in real life (probably?), and the lack of an upper bound just makes someone who hasn't seen the code that's generating the data unnecessarily anxious about things like Number.MAX_VALUE.

My suggestion would be to perhaps do something like "there are 10 values set between gibberish0 and gibberish256, find the values that are set". That would make the point about NotFound just as clearly, but without making me nervous.

rvagg commented 11 years ago

"there are 10 values set between gibberish0 and gibberish256, find the values that are set"

I like it! Thanks.

eugeneware commented 11 years ago

Yeah. I found the GET example a bit confusing in the wording too. It wasn't entirely clear in the wording (or maybe it's my lack of sleep) - that the numbers were actually consecutive.

Though, searching for 10 random values between gibberish0 and gibberish256, a brute force search aint too efficient either. I guess you can make a note in the problem.txt file that you'll learn a more efficient way to do this via a streaming range query in the next problem...

Perhaps just a simpler exercise upfront where you pass them a key, or a list of keys from the command line and they have to return the values for it. That way they can focus on one simple concept first (retrieval), which is a very common use case, before they have to tackle the idea of dealing with "not found" errors and iterating over a range.

rvagg commented 11 years ago

@eugeneware: I had the same suggestion from @substack and @dominictarr, start even simpler and just get the basics of the environment set up to do a single operation. So I'll add that now as exercise #2.

rvagg commented 11 years ago

I've added an initial getting-started-with-level exercise that's just a get and print, it's in spot 2

eugeneware commented 11 years ago

Here are my thoughts for some more exercises that can build on each other to build a more complex app, that will use sublevel and bytewise and possibly multilevel.

Basically teach people how to build a simple twitter app.

I've got some example code that does some of this in my level-microblog repo that I was messing around with. Feel free to steal/lift code from there.

Have a sample database that has the following sublevels:

users (indexed by twitter handle)
- The user has a handle, name and an array of followers
tweets (indexed by ID)

Sublevel Basic Exercise

Get and set some values from the sub levels
Possibly get a stream within a sublevel

Bytewise Basic Exercise - Generate a twitter feed for a tweet

Create a new sublevel called "Feed" that is indexed using bytewise by [handle, tweet ID]
Write a function that takes three arguments: the db of sublevels, a handle, and a message
The function must take the handle, lookup the user, find the followers, and populate the feed sublevel for all the followers.
Thus if 'eugeneware' has 'rvagg' and 'dominictarr' as followers, sending out a tweet with the message 'hey', will end up with the following entries in the "Feed" sublevel:

{ key: ['rvagg', 1234], value: { id: 1234, handle: 'eugeneware', message: 'hey' }
{ key: ['dominictarr', 1234], value: { id: 1234, handle: 'eugeneware', message: 'hey' }

Get a stream of the latest tweets for user

Get a list of of tweets given a user's handle and the message of the last ID that they read
The user has to use createReadStream with { start: [userID, lastId], end: [userID, Infinity] } to get their latest tweets.

In order to get bytewise and sublevel playing better together we need to use a yet unreleased version of bytewise which has the hex encoding builtin for strings, so that it works with the current string-only version of sublevel.

I've published a simple module called bytewise-hex that does returns the encodings to levelup as hex strings (copied straight out of byte wise). Would be OK for now.

Connect it to the web

Get them to turn their twitter feed stream into a streaming REST interface.
You could use level-livestream or level-hooks or something to do live updates of tweets or something.

Connect it to multilevel

Repeat the previous excercise but get them to do use a multilevel instance?

Extra Credit - Sharding and hash-rings

Provide 3 multilevel instances.
Use node-hashring or a simple hashing algorithm to distribute user's twitter feeds across the 3 servers.
Do a bunch of tweets.
Check that the right tweets end up on the right shards.

Anyway, that's my $0.02! Hope it helps!

rvagg commented 11 years ago

this is all good, I'll try and use at least some of it, unfortunately I'm starting to run out of time to get it all done!

one of the trickiest things is coming up with exercises that are small enough to fit the problem statement + hints into a reasonable number of lines yet be challenging enough to not be boring, I'm finding that if I have to write too much in problem.txt then it needs to be split into multiple problems, or discarded all together.

eugeneware commented 11 years ago

FYI @dominictarr got the latest bytewise library pushed out which has built-in hex encoding.

So you can now do:

var bytewise = require('bytewise/hex');
var sublevel = require('level-sublevel');
var db = sublevel(level('/tmp/db', { keyEncoding: bytewise }));

Ie. sublevels + bytewise work together nicely.

workshopper / levelmeup

Exercises #1

Sublevel Basic Exercise

Bytewise Basic Exercise - Generate a twitter feed for a tweet

Get a stream of the latest tweets for user

Connect it to the web

Connect it to multilevel

Extra Credit - Sharding and hash-rings