whamcloud / scheduling-connector-for-hadoop

HPC Adapter for Mapreduce/Yarn(HAM)
MIT License
6 stars 7 forks source link

Looking for more details. #1

Open mickours opened 7 years ago

mickours commented 7 years ago

Hi,

I'm currently working on this subject as a PhD student and I'm designing a similar solutions and I'd like to have more details on how it works and in which case, in order to add this project in my state-of-the-art.

First, I'd like to know if there is more material that explain the design of this solution, a scientific paper would be great!

Another thing that is not clear to me is: Is it working for any Yarn application, like Spark or Flink, or only with Hadoop Map Reduce?

If so, how do you manage the HPC allocation that requires a walltime?

Thanks, Michael

devaraj-kavali commented 7 years ago

Thanks @mickours for reaching us.

We have some details provided in the README and the links provided in the README. We are in the process of publishing the paper publicly.

Another thing that is not clear to me is: Is it working for any Yarn application, like Spark or Flink, or only with Hadoop Map Reduce?

It is designed to support any YARN application, we have verified this with the latest releases of Mapreduce and some older versions of Spark. We haven't verified with Flink yet, we may have to do little or no changes to support Flink.

If so, how do you manage the HPC allocation that requires a walltime?

We are not relying on the walltime since the Job completion cannot be predicted and completely depends on the input data size, the users can achieve this by using the killApplication API from the Yarn API if they want to terminate the Job due to crossing of the estimated completion time.

Please feel free to ask if you have any further questions.

Thanks Devaraj