gridkit / nanocloud

NanoCloud - distributed computing toolkit
56 stars 11 forks source link

Option to detach remote process and leave running. #24

Open thehutgroup-adamryan opened 5 years ago

thehutgroup-adamryan commented 5 years ago

Hi, is it possible to run a process on a remote node and then detach from the node without the process being terminated?

aragozin commented 5 years ago

Process remaning after master process has terminated is considered a zombie, thus big problem from Nanocloud POV (hunting zombie across 5 dozen of servers is no fun). Could you elaborate your case?

thehutgroup-adamryan commented 5 years ago

I want to start a long running process equivalent to running a shell command over ssh and then backgrounding and detaching the process so that it doesn't stop when the ssh session ends. For example if I had a raspberrypi on a home network I could deploy a small webserver from my IDE to it without the hassle of having a separate script to deploy and execute a jar.

aragozin commented 5 years ago

But you would also need a way to stop it, right?

I can suggest to to fork process from slave node, cloning classpath (System.getProperty("java.class.path")). You would need some main class though.

thehutgroup-adamryan commented 5 years ago

The new process can return a PID before detaching that could be used in a later session to kill the previous process, it wasn't an immediate requirement though as it may be that the application is just long running and I don't want to have to keep the local instance running. Is there room in the code to detach the process like the ssh example above (nohup,disown) rather than spawning a new one remotely? Your suggestion doesn't sound bad though, I'll give it a try.

aragozin commented 5 years ago

Process management is a can of worm, heart beats, remote objects, console redirect etc. Detachable process is definitely doable, but it would take an effort. Spawning process is clean and simple (though passing params through command arguments could be a mess). Concerning "birth control" of detached processes, I was using thing like https://github.com/aragozin/spring-petclinic/blob/demo/src/test/java/info/ragozin/demo/ProcessWatchDog.java Process associate it self with file, if file is gone it terminates (plus socket to check process liveleness).

thehutgroup-adamryan commented 5 years ago

For reference here's what I got working:

import org.gridkit.nanocloud.CloudFactory;
import org.gridkit.nanocloud.RemoteNode;

import java.io.*;
import java.lang.management.ManagementFactory;
import java.lang.management.RuntimeMXBean;
import java.net.InetAddress;
import java.time.LocalTime;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class Main
{
    public static class RemoteMain
    {
        public static void main( String[] args ) throws IOException
        {
            try(
                FileWriter fileWriter = new FileWriter( new File("remote-main.log") )
            ){
                while( true ){
                    fileWriter.write( LocalTime.now().toString()+"\n" );
                    fileWriter.flush();
                    try {
                        Thread.sleep(3000);
                    }
                    catch( InterruptedException e ){
                        break;
                    }
                }
            }
        }
    }

    public static void main( String[] args ) throws InterruptedException, TimeoutException, ExecutionException
    {
        String hostName = InetAddress.getLoopbackAddress().getHostAddress()+":22";
        Cloud cloud = CloudFactory.createCloud();
        cloud.node(hostName).x(RemoteNode.REMOTE)
            .setRemoteHost(hostName)
            .setPassword("<password>")
            .setRemoteJavaExec( ".../bin/java" )
            .useSimpleRemoting();
        cloud.node(hostName).touch();
        cloud.node(hostName).submit( (Callable<Integer> & Serializable)() -> runRemoteMain( "" ) ).get( 5, TimeUnit.SECONDS );
        cloud.node(hostName).shutdown();
        cloud.node(hostName).kill();
        cloud.shutdown();
    }

    private static Integer runRemoteMain( String...args ) throws IOException
    {
        RuntimeMXBean runtimeMxBean = ManagementFactory.getRuntimeMXBean();
        List<String> command = new ArrayList<>();
        command.add( System.getProperty("java.home")+"/bin/java" );
        command.add("-cp");
        command.add( runtimeMxBean.getClassPath() );
        command.add( RemoteMain.class.getName() );
        command.addAll( Arrays.asList(args) );
        Runtime.getRuntime().exec(  command.toArray( new String[0] )  );
        return 0;
    }
}

I do still see an issue however with the main process not terminating. I'm starting it from the IDE if that makes a difference.