hulianyuyy / CorrNet

Continuous Sign Language Recognition with Correlation Network (CVPR 2023)
84 stars 14 forks source link

空间不足问题 #16

Closed percise closed 5 months ago

percise commented 7 months ago

环境是pytorch1.8.1 python 3.8.18 cuda11.1 ctcdecode0.4成功安装,但是在训练第一轮结束报了内存不足问题,服务器是A100 80g

![Uploading 1700021110772.jpg…]() 求告知大概是什么问题,看了您的其他问题说是版本问题,更换了pytorch1.13.0 也成功安装上了ctcdecode 但是在运行时也会直接报ctc的问题,麻烦给个思路 谢谢 我应该怎么去弄 是和gcc版本有问题吗 目前是11.4的gcc 或者能否告诉我你的环境是什么吗

percise commented 7 months ago

1700021110772 这是问题代码

hulianyuyy commented 7 months ago

My environment is pytorch 1.10.1, ctcdecode 0.4.0, python 3.7.1, cuda 11.2. According to other issues, you may upgrade the pytorch version to try it.

percise commented 7 months ago

我的环境是pytorch 1.10.1,ctcdecode 0.4.0,python 3.7.1,cuda 11.2。根据其他问题,您可以升级pytorch版本来尝试。

感谢您的耐心解答,我再去试试

percise commented 7 months ago

我的环境是pytorch 1.10.1,ctcdecode 0.4.0,python 3.7.1,cuda 11.2。根据其他问题,您可以升级pytorch版本来尝试。

你好,我想请教一下 ,空间不足是不是因为内存不足导致的,我看在main.py中有个pin_memory设置为TRUE,他就一直会锁住内存,您的配置内存是多大呢。我现在已经改成false正在尝试了

hulianyuyy commented 7 months ago

I use a single 3090 GPU with 24G memory to train. But i figure that this issue is not raised by GPU memory, since your GPU has 80 GB memory.

hulianyuyy commented 7 months ago

Besides, you may refer to this issue. This is mostly caused by ctcdecode.

hulianyuyy commented 7 months ago

You could make some trys. If you still encounter this problem, i will add python decode, instead of ctc decode to perform decoding to get rid of this problem. My schedule is around 11.25.

percise commented 7 months ago

你可以做一些尝试。如果你仍然遇到这个问题,我将添加pythondecode,而不是ctcdecode来执行解码以摆脱这个问题。我的日程安排在11.25左右。

好的,力顶作者,为手语做出贡献!!!

kido1412y2y commented 6 months ago

I use a single 3090 GPU with 24G memory to train. But i figure that this issue is not raised by GPU memory, since your GPU has 80 GB memory.

Hello, may I ask how much memory was used during training with 24GB of memory? I am using two 3060 and one 12GB of memory. Is that enough? Because I used two GPUs, I changed here in main.py, but when I actually ran the code, the computer only used one GPU and then reported an error. Is there anything I missed? I hope to receive your reply. image image image

hulianyuyy commented 6 months ago

I use a single 3090 GPU with 24G memory to train. But i figure that this issue is not raised by GPU memory, since your GPU has 80 GB memory.

Hello, may I ask how much memory was used during training with 24GB of memory? I am using two 3060 and one 12GB of memory. Is that enough? Because I used two GPUs, I changed here in main.py, but when I actually ran the code, the computer only used one GPU and then reported an error. Is there anything I missed? I hope to receive your reply. image image image

About 20 GB memory for batch size of 2. As we use AMP to accelerate training, this code currectly doesn't support multiple GPUs. You may manually disable AMP, or try using batch size of 1 to run this code.

xxxiaosong commented 3 months ago

Hello, I would like to ask why I use two 4090 graphics cards for training, which is much slower than using a single card.

hulianyuyy commented 3 months ago

Maybe you have run some code on the 4090, ans so it slows down.