kite-sdk / kite

Kite SDK
http://kitesdk.org/docs/current/
Apache License 2.0
394 stars 263 forks source link

CDK-1000: Kite/SparkSQL integration #366

Closed dgreco closed 9 years ago

tomwhite commented 9 years ago

Thanks for the update @dgreco! I added some more review comments. Have you looked at why the Travis tests are failing?

It might be good to get more eyes on this - @joey wrote the original Spark integration so it would be good to have his take on this change.

You don't need to open new PRs for each change BTW. It's best to have one so that the discussion is in one place.

dgreco commented 9 years ago

The spark specific tests are passing now, it seems now that some of the tests are failing because of lacking of resources

tomwhite commented 9 years ago

One of the test failures is due to multiple Spark contexts running at once. Can you try setting spark.driver.allowMultipleContexts to true?

dgreco commented 9 years ago

Tom, I committed other stuff, commits: 8d79a53 and 53aa87e I'm still learning how to use PRs, did you check what I did? I changed the apis, I tried to reuse the concept of dataset descriptor to be passed to the save method, instead of passing multiple parameters. I also added a method for appending/overwriting to existing datasets. It looks nicer to me, clean apis. And less code.

tomwhite commented 9 years ago

Thanks for the latest changes @dgreco. It looks like the Spark tests are failing with NPEs:

https://travis-ci.org/kite-sdk/kite/jobs/63672318

Also Flume, possibly due to changes in the library versions:

https://travis-ci.org/kite-sdk/kite/jobs/63672316

Do these pass locally?

dgreco commented 9 years ago

Hi Tom, yes they are passing locally, I'm trying with the same commands that travis is using. It looks like that travis is exhausting resources.

dgreco commented 9 years ago

Hi Tom, finally I made the tests running on Travis, the problem was related to the spark context management when there are multiple spark tests. I refactored the tests to be sure that only one spark context is running at same moment.

dgreco commented 9 years ago

I close this PR since I'm restructuring my git repo