m4rs-mt / ILGPU

ILGPU JIT Compiler for high-performance .Net GPU programs
http://www.ilgpu.net
Other
1.35k stars 116 forks source link

[QUESTION]: Exception in Accelerator.Synchronize on CUDA #1179

Closed harrison314 closed 5 months ago

harrison314 commented 5 months ago

Question

I'm trying to find the most similar vectors on the graphics card using cosine similarity.

But in the Accelerator.Synchronize() method I get an exception ILGPU.Runtime.Cuda.CudaException: 'the launch timed out and was terminated'.

Example kernels work normally for me.

Code to reproduce the problem:

using ILGPU;
using ILGPU.Runtime;
using ILGPU.Runtime.Cuda;
using System;

namespace IlgpuTesting
{
    internal class Program
    {
        public static void Main()
        {
            const int vectorCount = 30_000;
            const int vectorLength = 512;

            SimulateLoadDataFromDb(vectorCount, vectorLength, out float[] buffer, out int[] idData);

            using Context context = Context.Create(builder => builder.Cuda().EnableAlgorithms());
            using Accelerator accelerator = context.GetPreferredDevice(preferCPU: false)
                             .CreateAccelerator(context);

            using MemoryBuffer1D<float, Stride1D.Dense> vectorInputData = accelerator.Allocate1D<float>(vectorCount * vectorLength);
            using MemoryBuffer1D<int, Stride1D.Dense> idInputData = accelerator.Allocate1D<int>(vectorCount);
            using MemoryBuffer1D<int, Stride1D.Dense> ouputData = accelerator.Allocate1D<int>(vectorCount);

            vectorInputData.CopyToCPU(buffer);
            idInputData.CopyToCPU(idData);

            Action<Index1D, ArrayView<float>, ArrayView<int>, int, ArrayView<int>> loadedKernel =
                accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView<float>, ArrayView<int>, int, ArrayView<int>>(SimmKernel);

            loadedKernel(vectorCount, vectorInputData.View, idInputData.View, vectorLength, ouputData.View);
            accelerator.Synchronize();

            int[] output = ouputData.GetAsArray1D();

        }

        private static float CosinusSimmilarity(ArrayView<float> vectorInputData,
               int startIndex1,
               int startIndex2,
               int vectorSize)
        {
            float topSum = 0.0f;
            float i1Sum = 0.0f;
            float i2Sum = 0.0f;

            for (int i = 0; i < vectorSize; i++)
            {
                float v1v = vectorInputData[startIndex1 + i];
                float v2v = vectorInputData[startIndex2 + i];

                topSum += v1v * v2v;
                i1Sum += v1v * v1v;
                i2Sum += v2v * v2v;
            }

            return topSum / (MathF.Sqrt(i1Sum) * MathF.Sqrt(i2Sum));
        }

        private static void SimmKernel(Index1D i,
            ArrayView<float> vectorInputData,
            ArrayView<int> idInputData,
            int vectorSize,
            ArrayView<int> outputData)
        {
            float maxSimm = -2.0f;
            int id = 0;

            for (int k = 0; k < vectorInputData.Length; k++)
            {
                if (k == i) break;

                float simm = CosinusSimmilarity(vectorInputData,
                    i * vectorSize,
                    k * vectorSize,
                    vectorSize);

                if(simm> maxSimm)
                {
                    maxSimm = simm;
                    id = idInputData[k];
                }
            }

            outputData[i] = id;
        }

        private static void SimulateLoadDataFromDb(int vectorCount, int vectorSize, out float[] buffer, out int[] idData)
        {
            buffer = new float[vectorCount * vectorSize];
            idData = new int[vectorCount];

            for (int i = 0; i < idData.Length; i++)
            {
                idData[i] = i;
            }

            for (int i = 0; i < buffer.Length; i++)
            {
                buffer[i] = (float)Random.Shared.NextDouble();
            }
        }
    }
}

What could be the problem?

Environment

Additional context

No response

MoFtZ commented 5 months ago

hi @harrison314.

I was able to run your code on an RTX 1070 without issue.

Launch timeouts normally occur because the kernel is taking too long to run on a GPU which has an active display.

You could try to work around the issue by reducing the workload.

Not sure if this will help, but you could also try changing from Auto-Grouped to Explicit Grouping. The auto-grouping will try to consume all the resources on the GPU. If you reduced the workload, it might avoid the timeout issue.

Separately, looks like you used CopyToCPU instead of CopyFromCPU.

vectorInputData.CopyFromCPU(buffer);
idInputData.CopyFromCPU(idData);
harrison314 commented 5 months ago

Thanks, reducing the worload helped.