microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.49k stars 4.3k forks source link

CNTK C# Crash when layer is deeper #3861

Closed peterkim333 closed 2 years ago

peterkim333 commented 2 years ago

Hi, I am a developer who use CNTK C# package with RTX 3090 GPU.

our team have rebuilt the CNTK C# for higher CUDA and cuDNN version. usually, it works for training and so on.

However, our program dies Irregularly during neural network learning. Usually Deep Learning is finished well. But sometimes it dies.

Error is below.

:System.AccessViolationException

Location: CNTK.CNTKLibPINVOKE.Variable__Name(System.Runtime.InteropServices.HandleRef) Location: CNTK.Variable._Name() Location: DeepLearningCore.SegmentationNetwork.GetParameters() Location: DeepLearningCore.Segmentation.TrainNetwork() Location: DeepLearningCore.Segmentation.Run() Location: System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) Location: System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) Location: System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) Location: System.Threading.ThreadHelper.ThreadStart()

I have no idea. we try to trace the error and figure out that the error occurs when try to get the specific parameter and its name, it is failed. It is not certain, about 500 or more layers make this error. Anybody who has same problem?