takuya-takeuchi / FaceRecognitionDotNet

The world's simplest facial recognition api for .NET on Windows, MacOS and Linux
MIT License
1.26k stars 306 forks source link

CudaException code 77 #19

Closed turowicz closed 5 years ago

turowicz commented 5 years ago

I'm getting the following error:

      CUDA Error Lib:libDlibDotNet.Native.Dnn.so Code:77 Driver:10000 Runti,:10000 Message:Exception of type 'DlibDotNet.CudaException' was thrown..
fail: People.Service[0]
      Exception of type 'DlibDotNet.CudaException' was thrown.
DlibDotNet.CudaException: Exception of type 'DlibDotNet.CudaException' was thrown.
   at DlibDotNet.Dnn.Cuda.ThrowCudaException(ErrorType error)
   at DlibDotNet.Dnn.LossMmod.Operator[T](IEnumerable`1 images, UInt64 batchSize)
   at FaceRecognitionDotNet.Dlib.Python.CnnFaceDetectionModelV1.Detect(LossMmod net, Image image, Int32 upsampleNumTimes)
   at FaceRecognitionDotNet.FaceRecognition.RawFaceLocations(Image faceImage, Int32 numberOfTimesToUpsample, Model model)
   at FaceRecognitionDotNet.FaceRecognition.FaceLocations(Image image, Int32 numberOfTimesToUpsample, Model model)+MoveNext()
   at System.Collections.Generic.List`1.AddEnumerable(IEnumerable`1 enumerable)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at People.Common.Services.IdentificationService.IdentifyAsync(Guid applicationId, Mat frame, Double tolerance) in /app/src/People.Common/Services/IdentificationService.cs:line 70
   at People.Common.Pipeline.Blocks.IdentificationBlock.CheckIdentityAsync(Result result) in /app/src/People.Common/Pipeline/Blocks/IdentificationBlock.cs:line 54

Any ideas what this means? Googling the error code

turowicz commented 5 years ago

@takuya-takeuchi have you had such problems?

takuya-takeuchi commented 5 years ago

@turowicz I have never face it. code 77 which is CudaErrorIllegalAccess. I guess image data has no matter. numberOfTimesToUpsample may occur issues. Could you provide information; image size, value of numberOfTimesToUpsample , etc.

turowicz commented 5 years ago

size: 1920x1080 or 1440x800 (depends on source) numberOfTimesToUpsample: 0

turowicz commented 5 years ago

The error starts to appear after ~1h of video processing (frame by frame)

turowicz commented 5 years ago

I'm investigating memory leaks.

turowicz commented 5 years ago

@takuya-takeuchi aren't you forgetting to dispose faceLandmark object?

https://github.com/takuya-takeuchi/FaceRecognitionDotNet/blob/master/src/FaceRecognitionDotNet/FaceRecognition.cs#L256

takuya-takeuchi commented 5 years ago

@turowicz You are right. Thank you!! And there are same issues in other line.

takuya-takeuchi commented 5 years ago

@turowicz Could you try nuget 1.2.3.6 package? If you have any issue, please let me know!!! Thank you!!

turowicz commented 5 years ago

@takuya-takeuchi I will test as soon as possible.

turowicz commented 5 years ago

@takuya-takeuchi I haven't seen this error for a while now. I have started getting error 74 though. Investigating. Will create separate issue if real problem.

turowicz commented 5 years ago

@takuya-takeuchi unfortunately the error still exists.

turowicz commented 5 years ago

Actually I think this is due to a problem on my end. Only one of two PCs has this error. It's Alienware Aurora R7 running Ubuntu 18.04.

turowicz commented 5 years ago

BTW memory usage is much better after the memory leak fix.

takuya-takeuchi commented 5 years ago

Alienware Aurora R7 running Ubuntu 18.04

I have same machine and I boot Ubuntu 18.04 from usb memory. My R7 has 1080 (not Ti).

I may be reproduce same issue. Could you tell me Nvidia component version - CUDA and cuDNN?

takuya-takeuchi commented 5 years ago

And if you can, provide sample minimum source code you can reproduce issue?

turowicz commented 5 years ago

The code is as simple as get face locations from image. Run it on 1000000 images or a video and at some point it will appear. The error happens randomly, but when it does, nothing fixes it until I reboot the computer. The error only appears on that one computer.

Nvidia SMI:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0 Off |                  N/A |
| 46%   78C    P2   121W / 250W |   1599MiB / 11177MiB |     92%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     23522      C   dotnet                                      1589MiB |
+-----------------------------------------------------------------------------+