vigna / fastutil

fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues.
Apache License 2.0
1.78k stars 196 forks source link

Parallel static utilities should use current ForkJoinPool if already in one #193

Closed techsy730 closed 3 years ago

techsy730 commented 3 years ago

The parallel static utility methods should use the current ForkJoinPool if already running in one. This gives the user the option to use a ForkJoinPool they are maintaining instead of being forced to use the common one (which they may not want for prioritization reasons or similar). This is done by calling the static method in a task itself that is submitted to the pool they want to use. This is also the only way you can control which pool is used for the JVM parallel operations (like java.util.Arrays.parallelSort)

ForkJoinTask.getPool(), which is actually a static method despite the name, can be used to tell if currently in a ForkJoinPool or not.

Stretch goal would be to have overloads that let the user give their own ExecutorService they want to use (since the above being the standard library's only method of controlling this is a bit dumb TBH)

vigna commented 3 years ago

So the best practice would be trying getPool() and then commomPool() in the former is null?

techsy730 commented 3 years ago

That is what I had in mind. We may want a non type-specific static helper method for this.

I should note the "traditional" way is to submit the first task with ForkJoinTask.fork(), which will do this behind the scenes. But that only works if the first task is a ForkJoinTask. Plus I honestly like being more explicit, and this makes it easier to be more flexible with later API evolution (like introducing a variant that takes an arbitrary ExecutorService if we ever want that).

Speiger commented 3 years ago

My Solution to that was do not directly use ForkJoinPool itself. Have a Helper class where the CommonPool is set by default, but you can replace it and all sorting "Multithreading" tasks just go to the helper with their task that they want to execute. Since invoke also automatically awaits the finish of the task this should be fine.

On top of that you can set a custom ForkJoinPool in the helper and if you try to set a "null" you just set the commonpool instead. Straight out of SanityChecks

    private static ForkJoinPool WORK_POOL = ForkJoinPool.commonPool();
    /**
     * Returns if the current thread-pool can handle multi-threading tasks
     * @return true if the threadcount is bigger the 1
     */
    public static boolean canParallelTask() {
        return WORK_POOL.getParallelism() > 1;
    }

    /**
     * Helper method to start a ForkJoinTask. This method will await the finalization of said task
     * @param task the Task to invoke
     */
    public static <T> void invokeTask(ForkJoinTask<T> task) {
        WORK_POOL.invoke(task);
    }

    /**
     * Helper method to control what ForkJoinPool is being used for any given task.
     * @Note this method is not thread-save. It is only there to provide control over how Library specific Threaded tasks are handled.
     * @param pool The ForkJoinPool that should receive the tasks. If null {@link ForkJoinPool#commonPool()} is set instead
     */
    public static void setWorkPool(ForkJoinPool pool) {
        WORK_POOL = pool == null ? ForkJoinPool.commonPool() : pool;
    }
techsy730 commented 3 years ago

The issue with that solution is that it forces all parallel tasks to be in the same pool. You might want different pools for working with different bits of data. Well I suppose you could set the pool before submitting the task, although that is a bit racy.

I'm a little unclear though. Is that is already the API already in fastutils? Or something you are proposing?

Speiger commented 3 years ago

@techsy730 nah this is out of my own Primitive Collection library since i got feed up with FastUtil to the point where I didn't show my best moments. And since I already know how it is to publish under anothers IP, I decided to make my own version that has nothing to do with fastutil

It is not public yet, Stuff like "BigLists/IndirectQueues" are not going to be a early thing (maybe never) but all of the issues (including navigable set) I found are resolved in my implementation. This example is actually the heavily cut down version of what is in my library. I am still designing this "ThreadPool" part because of this exact issue what you described, though my implementation already allows to solve that thing due to a special flag i added recently.

techsy730 commented 3 years ago

@vigna Are you in the middle of this or would you like me to make a PR?

vigna commented 3 years ago

A PR is fine. I think there's somewhere a utility method for "take this, if null take that" that we could use to choose between getPoll() and the common pool.