takuya-takeuchi / FaceRecognitionDotNet

The world's simplest facial recognition api for .NET on Windows, MacOS and Linux
MIT License
1.21k stars 297 forks source link

Question regarding age training #211

Open zevele opened 2 years ago

zevele commented 2 years ago

I'm trying to do the age training - now that the bug #206 was resolved. But it's very slow, after 24 hours I'm on step 83 and Epoch 1, at this rate the training will take about a two years to complete (to epoch 600)... Is it going to get faster? how long does the training suppose to last?

takuya-takeuchi commented 2 years ago

@zevele Please tell me

I may be able to reproduce your issue on docker container.

zevele commented 2 years ago

@zevele Please tell me

Thanks @takuya-takeuchi !

OS - Windows 10 "GPU is available or not" - How do I check? GPU - NVIDIA GeForce GTX 1050

Is there anything I can do to help debugging this on my system?

takuya-takeuchi commented 2 years ago

You can check whether gpu is running or not by

Just in case, do you use FaceRecognitionDotNet.CUDAXXX? FaceRecognitionDotNet rather than FaceRecognitionDotNet.CUDAXXX does not use GPU.

zevele commented 2 years ago

You can check whether gpu is running or not by

I ran the train again and checked the GPU utilization - and it's not using the GPU. I'm following the wiki regarding training the model . I'm running the following command (as per the instructions). dotnet add package FaceRecognitionDotNet I tried also dotnet add package FaceRecognitionDotNet.CUDA92 But GPU utilization is still zero.

I also see you wrote there: You should build DlibDotNetNative.dll, DlibDotNetNativeDnn.dll and DlibDotNetNativeDnnAgeClassification.dll with CUDA. How do I compile them with CUDA? how do I use FaceRecognitionDotNet.CUDAXXX?

takuya-takeuchi commented 2 years ago

Which cuda version do you install in your machine? If install CUDA 11.2, you must install FaceRecognitionDotNet.CUDA112. You must use proper version FaceRecognitionDotNet corresponding to installed CUDA. And you must install cudnn.

zevele commented 2 years ago

Which cuda version do you install in your machine? If install CUDA 11.2, you must install...

Thanks! I tried installing CUDA 11.2 and also downloaded cudnn 11.2. Then I Added the path reference to the cudnn in the environment variables. added both FaceRecognitionDotNet and FaceRecognitionDotNet.CUDA112: dotnet add package FaceRecognitionDotNet dotnet add package FaceRecognitionDotNet.CUDA112

Rebuilt the project using: dotnet build -c Release I also tried copying the dll's from the cudnn folder to the output folder.

But still zero GPU utilization. What am I missing?

takuya-takeuchi commented 2 years ago

You need not to use FaceRecognitionDotNet. You should install only FaceRecognitionDotNet.CUDA112. Please uninstall FaceRecognitionDotNet.

zevele commented 2 years ago

You need not to use FaceRecognitionDotNet...

Actually I tried that and got an error - so I thought I needed both of them. With both of them the training runs - just without CUDA).

If I do: dotnet remove package FaceRecognitionDotNet dotnet add package FaceRecognitionDotNet.CUDA112 (just in case...) and then dotnet build -c Release When I run the training I get:


              Epoch: 600
      Learning Rate: 0.001
  Min Learning Rate: 1E-05
     Min Batch Size: 384
Validation Interval: 20
           Use Mean: False

Start load train images
System.TypeInitializationException: The type initializer for 'DlibDotNet.NativeMethods' threw an exception. ---> System.DllNotFoundException: Unable to load DLL 'DlibDotNetNativeDnn': The specified module could not be found. (Exception from HRESULT: 0x8007007E)
   at DlibDotNet.NativeMethods.LossMetric_anet_type_create()
   at DlibDotNet.NativeMethods..cctor()
   --- End of inner exception stack trace ---
   at DlibDotNet.NativeMethods.load_image_matrix(MatrixElementType type, Byte[] path, Int32 pathLength, IntPtr& matrix, IntPtr& error_message)
   at DlibDotNet.Dlib.LoadImageAsMatrix[T](String path)
   at AgeTraining.Program.Load(String type, String directory, String meanImage, IList`1& images, IList`1& labels) in D:\Projects\VS2019\FaceRecognitionDotNet-1.3.0.7\tools\AgeTraining\Program.cs:line 268
   at AgeTraining.Program.Train(String baseName, String dataset, UInt32 epoch, Double learningRate, Double minLearningRate, UInt32 miniBatchSize, UInt32 validation, Boolean useMean) in D:\Projects\VS2019\FaceRecognitionDotNet-1.3.0.7\tools\AgeTraining\Program.cs:line 427```
takuya-takeuchi commented 2 years ago

@zevele

I had reprocued your issur but I'm not sure why issue occurs. When release pacakge, link check should have no erros. Of course, CUDA112 works fine.

But AgeTraining does not work even though link DlibDotNet.CUDA112.

takuya-takeuchi commented 2 years ago

These libraries are deployed to app dir.

But error is still alive.

zevele commented 2 years ago

These libraries are deployed to app dir....

Thanks @takuya-takeuchi

I tried to copy the files manually - but I get the same error. Is there anything else that can be done?

takuya-takeuchi commented 2 years ago

Note

Simple dotnet test program links FRDN.CUDA112 works fine with cuda libs.

using System;
using System.Collections.Generic;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using System.Linq;
using System.Net.Http;
using System.Reflection;
using System.Runtime.Serialization.Formatters.Binary;
using DlibDotNet;
using FaceRecognitionDotNet.Extensions;
using Xunit;
using Xunit.Abstractions;

namespace FaceRecognitionDotNet.Tests
{

    public class FaceRecognitionTest
    {

        private readonly string ModelDirectory = "Models";

        public FaceRecognitionTest(ITestOutputHelper testOutputHelper)
        {
            var dir = Environment.GetEnvironmentVariable("FaceRecognitionDotNetModelDir");
            if (Directory.Exists(dir))
            {
                ModelDirectory = dir;
            }
        }

        [Fact]
        public void Test()
        {
            using(var fr = FaceRecognitionDotNet.FaceRecognition.Create(ModelDirectory))
            {
            }
        }

    }

}

But simple console program links FRDN.CUDA112 does not work even if deploy cuda libs.

using System;
using System.Collections.Generic;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Threading.Tasks;

using FaceRecognitionDotNet;

namespace Issue
{

    class Program
    {

        private static void Main(string[] args)
        {
            using(var fr = FaceRecognitionDotNet.FaceRecognition.Create("models"))
            {
                        Console.WriteLine("test");
            }
        }

    }

}
takuya-takeuchi commented 2 years ago

And link issue occurs for only CUDA 11.X. It is very weird.

takuya-takeuchi commented 2 years ago

Note

D:\Works\OpenSource\Temp\FaceRecognitionDotNet\17.#211>dotnet run -c Release
Unhandled Exception: System.TypeInitializationException: The type initializer for 'DlibDotNet.NativeMethods' threw an exception. ---> System.DllNotFoundException: Unable to load DLL 'DlibDotNetNativeDnn': 指定されたモジュールが見つかりませ ん。 (Exception from HRESULT: 0x8007007E)
   at DlibDotNet.NativeMethods.LossMetric_anet_type_create()
   at DlibDotNet.NativeMethods..cctor()
   --- End of inner exception stack trace ---
   at DlibDotNet.NativeMethods.get_frontal_face_detector()
   at FaceRecognitionDotNet.FaceRecognition..ctor(String directory)
   at Issue.Program.Main(String[] args) in D:\Works\OpenSource\Temp\FaceRecognitionDotNet\17.#211\Program.cs:line 18

But it works.

D:\Works\OpenSource\Temp\FaceRecognitionDotNet\17.#211\bin\Release\netcoreapp2.0>dotnet Issue.dll
test

It looks like program runs in wrong current directory. So after deply cuda libs in directory contains *.csproj, it work fine.

D:\Works\OpenSource\Temp\FaceRecognitionDotNet\17.#211>dir
 ドライブ D のボリューム ラベルは Data です
 ボリューム シリアル番号は ACE6-77C8 です

 D:\Works\OpenSource\Temp\FaceRecognitionDotNet\17.#211 のディレクトリ

2022/08/14  16:08    <DIR>          .
2022/08/14  16:08    <DIR>          ..
2022/08/14  15:21    <DIR>          bin
2022/08/14  16:08    <SYMLINK>      cublas64_11.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cublas64_11.dll]
2022/08/14  16:08    <SYMLINK>      cublasLt64_11.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cublasLt64_11.dll]
2022/08/14  16:08    <SYMLINK>      cudnn64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn64_8.dll]
2022/08/14  16:08    <SYMLINK>      cudnn_adv_infer64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn_adv_infer64_8.dll]
2022/08/14  16:08    <SYMLINK>      cudnn_adv_train64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn_adv_train64_8.dll]
2022/08/14  16:08    <SYMLINK>      cudnn_cnn_infer64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn_cnn_infer64_8.dll]
2022/08/14  16:08    <SYMLINK>      cudnn_cnn_train64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn_cnn_train64_8.dll]
2022/08/14  16:08    <SYMLINK>      cudnn_ops_infer64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn_ops_infer64_8.dll]
2022/08/14  16:08    <SYMLINK>      cudnn_ops_train64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn_ops_train64_8.dll]
2022/08/14  16:08    <SYMLINK>      curand64_10.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\curand64_10.dll]
2022/08/14  16:08    <SYMLINK>      cusolver64_11.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cusolver64_11.dll]
2022/03/18  21:34            29,546 face1.jpg
2022/03/18  21:34            12,429 face2.jpg
2022/08/14  15:21               376 Issue.csproj
2021/03/27  20:28             1,114 Issue.sln
2021/02/16  01:33           729,940 mmod_human_face_detector.dat
2022/08/14  14:19    <SYMLINKD>     models [D:\Works\OpenSource\FaceRecognitionDotNet.Models]
2022/08/14  15:21    <DIR>          obj
2022/08/14  14:20               428 Program.BAK
2022/08/14  15:12               460 Program.cs
              18 個のファイル             774,293 バイト
               5 個のディレクトリ  836,606,410,752 バイトの空き領域

D:\Works\OpenSource\Temp\FaceRecognitionDotNet\17.#211>dotnet run -c Release
test
zevele commented 2 years ago

Note...

I've tried copying these dlls from the release folder to the solution folder (keeping them in both folders)... now the trainer runs again - but still no GPU utilization: In the task manager I see the command prompt where I run the trainer: Windows Command Processor has about 30% CPU utilization but no GPU.

Should I try using older versions of CUDA?

I didn't do symlinks and do not have the models link (is it a problem?)


14/08/2022  17:21    <DIR>          .
14/08/2022  17:21    <DIR>          ..
29/07/2022  18:57                69 .gitignore
16/07/2022  23:09    <DIR>          Adience
01/08/2022  09:18        70,887,810 adience-age-network_600_0.001_1E-05_384_False
14/08/2022  17:09                 0 adience-age-network_600_0.001_1E-05_384_False.log
01/08/2022  09:35        70,887,820 adience-age-network_600_0.001_1E-05_384_False_
16/07/2022  23:11    <DIR>          AdienceDataset
16/07/2022  23:47    <DIR>          AdienceDataset_preprocessed
04/08/2022  18:13               840 AgeTraining.csproj
06/08/2022  09:56    <DIR>          bin
15/02/2021  10:07       107,330,560 cublas64_11.dll
15/02/2021  10:07       175,706,112 cublasLt64_11.dll
25/02/2021  10:52           222,720 cudnn64_8.dll
25/02/2021  11:36       128,429,056 cudnn_adv_infer64_8.dll
25/02/2021  11:50        82,672,640 cudnn_adv_train64_8.dll
25/02/2021  11:58       545,695,232 cudnn_cnn_infer64_8.dll
25/02/2021  12:16        87,374,336 cudnn_cnn_train64_8.dll
25/02/2021  11:04       273,139,712 cudnn_ops_infer64_8.dll
25/02/2021  11:18        46,076,416 cudnn_ops_train64_8.dll
15/02/2021  15:38        60,627,968 curand64_10.dll
15/02/2021  15:38       396,296,704 cusolver64_11.dll
30/07/2022  21:43    <DIR>          images
14/08/2022  17:25    <DIR>          obj
29/07/2022  18:57            27,182 Program.cs
29/07/2022  18:57             5,261 README.md
30/07/2022  21:43    <DIR>          tools
csetuomas commented 11 months ago

I have the same problem. Did you ever solve this problem? It takes life time to train the model with CPU.

zevele commented 11 months ago

I have the same problem. Did you ever solve this problem? It takes life time to train the model with CPU. No... I gave up... but if you manage to do it, please share your insights here

csetuomas commented 11 months ago

Just can't get the GPU work. I have tried almost every CUDA versions. I have now been training the model with CPU over 1,5 weeks and about 7% done :| This is just waste of resources. Could someone just share the trained models for age and gender.