sassoftware / python-dlpy

The SAS Deep Learning Python (DLPy) package provides the high-level Python APIs to deep learning methods in SAS Visual Data Mining and Machine Learning. It allows users to build deep learning models using friendly Keras-like APIs.
Apache License 2.0
224 stars 131 forks source link

problem with training yolov2 model #402

Open crack3n-collab opened 8 months ago

crack3n-collab commented 8 months ago

https://github.com/michaelgorkow/SAS_DeepLearning/blob/master/Face_Mask_Detection/face_mask_detection_training.ipynb Tried running the code above and had errors with the training section image image

crack3n-collab commented 8 months ago

any help will be very much appreciated

dxq77dxq commented 8 months ago

Hello,

I was able to run the notebook without any issue. Would you please check your cas connection?

crack3n-collab commented 8 months ago

I am sorry I don't follow. What do you mean by check the cas connection? Isn't the connection established at the start of the notebook code?

crack3n-collab commented 8 months ago

Another information I didn't mention is that I am running the notebook on a trial environment

dxq77dxq commented 8 months ago

Would you please share your entire notebook?

crack3n-collab commented 8 months ago

Sorry for the very late reply. What do you mean by sharing the entire notebook? I am running the notebook but with changes to the file position. But I was doing the image augmentation part and I had errors. I checked and the data is there and the format is in .txt and .jpg .

https://github.com/michaelgorkow/SAS_DeepLearning/blob/master/Face_Mask_Detection/image_augmentation_object_detection.ipynb image

dxq77dxq commented 8 months ago

Please update your dlpy to the latest version using pip install git+https://github.com/sassoftware/python-dlpy.git

crack3n-collab commented 8 months ago

Ok I resolved the CAResults objects AttributeError and now I am stuck at the training section. The session just closes itself because of an unhandled exception. The CAS connection is working fine.

image

dxq77dxq commented 8 months ago

Could you please share the error message?

crack3n-collab commented 8 months ago

The error message is the exact same as the one I posted when I first created this issue.

kilbyjmichael commented 8 months ago

SAS support requested I open an issue here. I am facing the same issue, in the CAS controller log I see the 'Floating point divide by zero' error after the python code fails with the error 'SWATCASActionError: The Session is no longer active due to an unhandled exception.' just like @crack3n-collab is seeing. @crack3n-collab can you check your CAS controller logs? The default location is usually /opt/sas/viya/config/var/log/cas/default/cas_date_servername_xxxxxx.log.

Error from CAS logs: 2024-01-03T14:18:26,695 FATAL [00000007] 1751419 xxxx 99 [cas.c:161] - Exception in action. 2024-01-03T14:18:26,695 FATAL [00000007] 1751419 xxxx 99 [cas.c:163] - Floating point divide by zero 2024-01-03T14:18:26,696 FATAL [00000007] 1751419 xxxx 99 [cas.c:188] - /opt/sas/viya/home/SASFoundation/sasexe/tkcasdrv.so(tktracex+0x42) [0x7f5696144612]

DLPY version 1.2.1-dev

crack3n-collab commented 8 months ago

Are the case controller logs accessible for SAS trial users? I don't seem to be able to find it.

kilbyjmichael commented 8 months ago

SAS support informed me that SAS Viya does not support any GPUs other than Pascal or Volta architecture and since I was trying to run the code on a NVIDIA A2 it will not work. @crack3n-collab what GPU are you using?

dxq77dxq commented 8 months ago

Thanks for the info. @shlongsas will look into this issue soon.

crack3n-collab commented 8 months ago

I set my GPU to false.

shlongsas commented 5 months ago

I am able to run the notebook without any issue. Could you try to get the most current version of SAS Viya?