Closed mochaminte closed 5 months ago
As we don't use the csv for labelling of the datasets in this repo, i'm not quite sure what's the meaning of the id of each sentence in the csv of British sign language dataset. Besides, you can find the output-hypothesis files here.
Thank you for that. During the training phase, a lot of my loss is NaN. My output is coming out like this:
loss is nan
tensor([112, 48]) frames
tensor([19, 10]) glosses
Do you have an idea of why ?
This is a natural result if you use the original code of CorrNet to train. If the outputs like this don't occur more than five times during an epoch, it doesn't matter. This is caused by the squeezed input sentence length doesn't exceed the label length.
Unfortunately I am getting this more than 5 times during an epoch. I have not been able to get my dev_WER below 100%.
Have you encountered any errors during training?
No, I am not
though in my output-hypothesis files, I am only getting one word translations for all the video files
It seems strange. I suppose that you may link a wrong dataset path to the model or the model may receive a dataset that doesn't correspond to the labels, thus incurring these errors. You may paste the baseline.yaml or log.txt here and i can check them for you.
I believe my dataset path to my model is correct as I've checked the inputted data and it is from the british sign language dataset (bobsl). log.txt
I have checked the log.txt and don't observe any issue in it. I suppose that this issue may be attributed to the difficulty of bobsl? Other works have shown that the WER on bobsl can only reach ~60%.
If you don't mind, could you send me the links to the works that you found ? I still find it strange that my output hypothesis files only output one word as translations
Sorry, i misremember it. This is performance (>60%) the sign language recognition accuracy instead of continuous sign language recognition in their paper (BBC-Oxford British Sign Language Dataset). Besides, i noticed that in BOBSL, they seems to not provide a benchmark of CSLR. Have you used the SLT benchmark to train CSLR models?
No I haven't, but I'm not sure I understand what you mean by that.
The BOBSL dataset, supports three tasks including sign recognition, Sign language sentence alignment and sign language translation, but not including continuous sign language recognition. Do you use the data for sign language translation to train CorrNet for the continuous sign language recognition task?
No, I didn't use that data. I downloaded the videos and used the manually aligned subtitles to get sentence timing and the labels. I then extracted the frames for each sentence from the videos at 25fps at 256x256. Afterwards I followed similar steps as you described in your README file to set up the BOBSL dataset
It's worth noting that the subtitles is actually the translated text, which should be used for the sign language translation task instead of continuous sign language recognition task.
I understand now, so it is not possible to use bobsl for continuous sign language recognition then ?
Yes, in my view, it can't be used for continuous sign language recognition now.
okay i understand, thank you for your help
Hi, thank you for your contribution, I am currently trying to use this to train on British sign language dataset. Just to check, the id of each sentence in the csv file does not need to be unique, right ?
Also, could I see your output-hypothesis files if possible ?