Open zevele opened 2 years ago
@zevele Please tell me
I may be able to reproduce your issue on docker container.
@zevele Please tell me
Thanks @takuya-takeuchi !
OS - Windows 10 "GPU is available or not" - How do I check? GPU - NVIDIA GeForce GTX 1050
Is there anything I can do to help debugging this on my system?
You can check whether gpu is running or not by
Just in case, do you use FaceRecognitionDotNet.CUDAXXX?
FaceRecognitionDotNet
rather than FaceRecognitionDotNet.CUDAXXX
does not use GPU.
You can check whether gpu is running or not by
I ran the train again and checked the GPU utilization - and it's not using the GPU. I'm following the wiki regarding training the model . I'm running the following command (as per the instructions).
dotnet add package FaceRecognitionDotNet
I tried also
dotnet add package FaceRecognitionDotNet.CUDA92
But GPU utilization is still zero.
I also see you wrote there:
You should build DlibDotNetNative.dll, DlibDotNetNativeDnn.dll and DlibDotNetNativeDnnAgeClassification.dll with CUDA.
How do I compile them with CUDA? how do I use FaceRecognitionDotNet.CUDAXXX?
Which cuda version do you install in your machine? If install CUDA 11.2, you must install FaceRecognitionDotNet.CUDA112. You must use proper version FaceRecognitionDotNet corresponding to installed CUDA. And you must install cudnn.
Which cuda version do you install in your machine? If install CUDA 11.2, you must install...
Thanks! I tried installing CUDA 11.2 and also downloaded cudnn 11.2. Then I Added the path reference to the cudnn in the environment variables.
added both FaceRecognitionDotNet and FaceRecognitionDotNet.CUDA112:
dotnet add package FaceRecognitionDotNet
dotnet add package FaceRecognitionDotNet.CUDA112
Rebuilt the project using:
dotnet build -c Release
I also tried copying the dll's from the cudnn folder to the output folder.
But still zero GPU utilization. What am I missing?
You need not to use FaceRecognitionDotNet. You should install only FaceRecognitionDotNet.CUDA112. Please uninstall FaceRecognitionDotNet.
You need not to use FaceRecognitionDotNet...
Actually I tried that and got an error - so I thought I needed both of them. With both of them the training runs - just without CUDA).
If I do:
dotnet remove package FaceRecognitionDotNet
dotnet add package FaceRecognitionDotNet.CUDA112
(just in case...)
and then dotnet build -c Release
When I run the training I get:
Epoch: 600
Learning Rate: 0.001
Min Learning Rate: 1E-05
Min Batch Size: 384
Validation Interval: 20
Use Mean: False
Start load train images
System.TypeInitializationException: The type initializer for 'DlibDotNet.NativeMethods' threw an exception. ---> System.DllNotFoundException: Unable to load DLL 'DlibDotNetNativeDnn': The specified module could not be found. (Exception from HRESULT: 0x8007007E)
at DlibDotNet.NativeMethods.LossMetric_anet_type_create()
at DlibDotNet.NativeMethods..cctor()
--- End of inner exception stack trace ---
at DlibDotNet.NativeMethods.load_image_matrix(MatrixElementType type, Byte[] path, Int32 pathLength, IntPtr& matrix, IntPtr& error_message)
at DlibDotNet.Dlib.LoadImageAsMatrix[T](String path)
at AgeTraining.Program.Load(String type, String directory, String meanImage, IList`1& images, IList`1& labels) in D:\Projects\VS2019\FaceRecognitionDotNet-1.3.0.7\tools\AgeTraining\Program.cs:line 268
at AgeTraining.Program.Train(String baseName, String dataset, UInt32 epoch, Double learningRate, Double minLearningRate, UInt32 miniBatchSize, UInt32 validation, Boolean useMean) in D:\Projects\VS2019\FaceRecognitionDotNet-1.3.0.7\tools\AgeTraining\Program.cs:line 427```
@zevele
I had reprocued your issur but I'm not sure why issue occurs. When release pacakge, link check should have no erros. Of course, CUDA112 works fine.
But AgeTraining does not work even though link DlibDotNet.CUDA112.
These libraries are deployed to app dir.
But error is still alive.
These libraries are deployed to app dir....
Thanks @takuya-takeuchi
I tried to copy the files manually - but I get the same error. Is there anything else that can be done?
Note
Simple dotnet test program links FRDN.CUDA112 works fine with cuda libs.
using System;
using System.Collections.Generic;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using System.Linq;
using System.Net.Http;
using System.Reflection;
using System.Runtime.Serialization.Formatters.Binary;
using DlibDotNet;
using FaceRecognitionDotNet.Extensions;
using Xunit;
using Xunit.Abstractions;
namespace FaceRecognitionDotNet.Tests
{
public class FaceRecognitionTest
{
private readonly string ModelDirectory = "Models";
public FaceRecognitionTest(ITestOutputHelper testOutputHelper)
{
var dir = Environment.GetEnvironmentVariable("FaceRecognitionDotNetModelDir");
if (Directory.Exists(dir))
{
ModelDirectory = dir;
}
}
[Fact]
public void Test()
{
using(var fr = FaceRecognitionDotNet.FaceRecognition.Create(ModelDirectory))
{
}
}
}
}
But simple console program links FRDN.CUDA112 does not work even if deploy cuda libs.
using System;
using System.Collections.Generic;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
using FaceRecognitionDotNet;
namespace Issue
{
class Program
{
private static void Main(string[] args)
{
using(var fr = FaceRecognitionDotNet.FaceRecognition.Create("models"))
{
Console.WriteLine("test");
}
}
}
}
And link issue occurs for only CUDA 11.X. It is very weird.
Note
D:\Works\OpenSource\Temp\FaceRecognitionDotNet\17.#211>dotnet run -c Release
Unhandled Exception: System.TypeInitializationException: The type initializer for 'DlibDotNet.NativeMethods' threw an exception. ---> System.DllNotFoundException: Unable to load DLL 'DlibDotNetNativeDnn': 指定されたモジュールが見つかりませ ん。 (Exception from HRESULT: 0x8007007E)
at DlibDotNet.NativeMethods.LossMetric_anet_type_create()
at DlibDotNet.NativeMethods..cctor()
--- End of inner exception stack trace ---
at DlibDotNet.NativeMethods.get_frontal_face_detector()
at FaceRecognitionDotNet.FaceRecognition..ctor(String directory)
at Issue.Program.Main(String[] args) in D:\Works\OpenSource\Temp\FaceRecognitionDotNet\17.#211\Program.cs:line 18
But it works.
D:\Works\OpenSource\Temp\FaceRecognitionDotNet\17.#211\bin\Release\netcoreapp2.0>dotnet Issue.dll
test
It looks like program runs in wrong current directory. So after deply cuda libs in directory contains *.csproj, it work fine.
D:\Works\OpenSource\Temp\FaceRecognitionDotNet\17.#211>dir
ドライブ D のボリューム ラベルは Data です
ボリューム シリアル番号は ACE6-77C8 です
D:\Works\OpenSource\Temp\FaceRecognitionDotNet\17.#211 のディレクトリ
2022/08/14 16:08 <DIR> .
2022/08/14 16:08 <DIR> ..
2022/08/14 15:21 <DIR> bin
2022/08/14 16:08 <SYMLINK> cublas64_11.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cublas64_11.dll]
2022/08/14 16:08 <SYMLINK> cublasLt64_11.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cublasLt64_11.dll]
2022/08/14 16:08 <SYMLINK> cudnn64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn64_8.dll]
2022/08/14 16:08 <SYMLINK> cudnn_adv_infer64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn_adv_infer64_8.dll]
2022/08/14 16:08 <SYMLINK> cudnn_adv_train64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn_adv_train64_8.dll]
2022/08/14 16:08 <SYMLINK> cudnn_cnn_infer64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn_cnn_infer64_8.dll]
2022/08/14 16:08 <SYMLINK> cudnn_cnn_train64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn_cnn_train64_8.dll]
2022/08/14 16:08 <SYMLINK> cudnn_ops_infer64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn_ops_infer64_8.dll]
2022/08/14 16:08 <SYMLINK> cudnn_ops_train64_8.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn_ops_train64_8.dll]
2022/08/14 16:08 <SYMLINK> curand64_10.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\curand64_10.dll]
2022/08/14 16:08 <SYMLINK> cusolver64_11.dll [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cusolver64_11.dll]
2022/03/18 21:34 29,546 face1.jpg
2022/03/18 21:34 12,429 face2.jpg
2022/08/14 15:21 376 Issue.csproj
2021/03/27 20:28 1,114 Issue.sln
2021/02/16 01:33 729,940 mmod_human_face_detector.dat
2022/08/14 14:19 <SYMLINKD> models [D:\Works\OpenSource\FaceRecognitionDotNet.Models]
2022/08/14 15:21 <DIR> obj
2022/08/14 14:20 428 Program.BAK
2022/08/14 15:12 460 Program.cs
18 個のファイル 774,293 バイト
5 個のディレクトリ 836,606,410,752 バイトの空き領域
D:\Works\OpenSource\Temp\FaceRecognitionDotNet\17.#211>dotnet run -c Release
test
Note...
I've tried copying these dlls from the release folder to the solution folder (keeping them in both folders)... now the trainer runs again - but still no GPU utilization: In the task manager I see the command prompt where I run the trainer: Windows Command Processor has about 30% CPU utilization but no GPU.
Should I try using older versions of CUDA?
I didn't do symlinks and do not have the models link (is it a problem?)
14/08/2022 17:21 <DIR> .
14/08/2022 17:21 <DIR> ..
29/07/2022 18:57 69 .gitignore
16/07/2022 23:09 <DIR> Adience
01/08/2022 09:18 70,887,810 adience-age-network_600_0.001_1E-05_384_False
14/08/2022 17:09 0 adience-age-network_600_0.001_1E-05_384_False.log
01/08/2022 09:35 70,887,820 adience-age-network_600_0.001_1E-05_384_False_
16/07/2022 23:11 <DIR> AdienceDataset
16/07/2022 23:47 <DIR> AdienceDataset_preprocessed
04/08/2022 18:13 840 AgeTraining.csproj
06/08/2022 09:56 <DIR> bin
15/02/2021 10:07 107,330,560 cublas64_11.dll
15/02/2021 10:07 175,706,112 cublasLt64_11.dll
25/02/2021 10:52 222,720 cudnn64_8.dll
25/02/2021 11:36 128,429,056 cudnn_adv_infer64_8.dll
25/02/2021 11:50 82,672,640 cudnn_adv_train64_8.dll
25/02/2021 11:58 545,695,232 cudnn_cnn_infer64_8.dll
25/02/2021 12:16 87,374,336 cudnn_cnn_train64_8.dll
25/02/2021 11:04 273,139,712 cudnn_ops_infer64_8.dll
25/02/2021 11:18 46,076,416 cudnn_ops_train64_8.dll
15/02/2021 15:38 60,627,968 curand64_10.dll
15/02/2021 15:38 396,296,704 cusolver64_11.dll
30/07/2022 21:43 <DIR> images
14/08/2022 17:25 <DIR> obj
29/07/2022 18:57 27,182 Program.cs
29/07/2022 18:57 5,261 README.md
30/07/2022 21:43 <DIR> tools
I have the same problem. Did you ever solve this problem? It takes life time to train the model with CPU.
I have the same problem. Did you ever solve this problem? It takes life time to train the model with CPU. No... I gave up... but if you manage to do it, please share your insights here
Just can't get the GPU work. I have tried almost every CUDA versions. I have now been training the model with CPU over 1,5 weeks and about 7% done :| This is just waste of resources. Could someone just share the trained models for age and gender.
I'm trying to do the age training - now that the bug #206 was resolved. But it's very slow, after 24 hours I'm on step 83 and Epoch 1, at this rate the training will take about a two years to complete (to epoch 600)... Is it going to get faster? how long does the training suppose to last?