kite-sdk / kite

Kite SDK
http://kitesdk.org/docs/current/
Apache License 2.0
394 stars 263 forks source link

CDK-1011: Support a configurable number of writers per partition when… #377

Closed mkwhitacre closed 9 years ago

mkwhitacre commented 9 years ago

… writing to a dataset, along with copying and compaction.

I didn't see any current tests for the CrunchDatasets.partition(...) method or CompactionCommand/Task. I can add some if necessary.

rdblue commented 9 years ago

Thanks @mkwhitacre! This looks pretty close. I would appreciate you adding those tests, if you don't mind. That was definitely an oversight.

mkwhitacre commented 9 years ago

If I promise to write tests but if don't get them done by 1.1.0 code freeze can this still make it in? :smile:

I'll see if I can get some out tomorrow.

mkwhitacre commented 9 years ago

Ok added some tests. I'm not completely sold on them b/c I was getting some funky results with regard to the number of files being produced. I think that is just test setup (not enough data to properly distributed across the hashes or number of writers).

I might not be able to get back to this till the end of the week unfortunately but wanted to put this out in case someone else has the cycles.

rdblue commented 9 years ago

Thanks @mkwhitacre! This looks great. I'm working on making the tests work and I've already found a bug exposed by them.

rdblue commented 9 years ago

Merged as https://github.com/kite-sdk/kite/commit/f8456578

Thanks @mkwhitacre!