hulianyuyy / CorrNet

Continuous Sign Language Recognition with Correlation Network (CVPR 2023)
84 stars 14 forks source link

sentence id #34

Closed mochaminte closed 5 months ago

mochaminte commented 5 months ago

Hi, thank you for your contribution, I am currently trying to use this to train on British sign language dataset. Just to check, the id of each sentence in the csv file does not need to be unique, right ?

Also, could I see your output-hypothesis files if possible ?

hulianyuyy commented 5 months ago

As we don't use the csv for labelling of the datasets in this repo, i'm not quite sure what's the meaning of the id of each sentence in the csv of British sign language dataset. Besides, you can find the output-hypothesis files here.

mochaminte commented 5 months ago

Thank you for that. During the training phase, a lot of my loss is NaN. My output is coming out like this:

loss is nan
tensor([112,  48])  frames
tensor([19, 10])  glosses

Do you have an idea of why ?

hulianyuyy commented 5 months ago

This is a natural result if you use the original code of CorrNet to train. If the outputs like this don't occur more than five times during an epoch, it doesn't matter. This is caused by the squeezed input sentence length doesn't exceed the label length.

mochaminte commented 5 months ago

Unfortunately I am getting this more than 5 times during an epoch. I have not been able to get my dev_WER below 100%.

hulianyuyy commented 5 months ago

Have you encountered any errors during training?

mochaminte commented 5 months ago

No, I am not

mochaminte commented 5 months ago

though in my output-hypothesis files, I am only getting one word translations for all the video files

hulianyuyy commented 5 months ago

It seems strange. I suppose that you may link a wrong dataset path to the model or the model may receive a dataset that doesn't correspond to the labels, thus incurring these errors. You may paste the baseline.yaml or log.txt here and i can check them for you.

mochaminte commented 5 months ago

I believe my dataset path to my model is correct as I've checked the inputted data and it is from the british sign language dataset (bobsl). log.txt

hulianyuyy commented 5 months ago

I have checked the log.txt and don't observe any issue in it. I suppose that this issue may be attributed to the difficulty of bobsl? Other works have shown that the WER on bobsl can only reach ~60%.

mochaminte commented 5 months ago

If you don't mind, could you send me the links to the works that you found ? I still find it strange that my output hypothesis files only output one word as translations

hulianyuyy commented 5 months ago

Sorry, i misremember it. This is performance (>60%) the sign language recognition accuracy instead of continuous sign language recognition in their paper (BBC-Oxford British Sign Language Dataset). Besides, i noticed that in BOBSL, they seems to not provide a benchmark of CSLR. Have you used the SLT benchmark to train CSLR models?

mochaminte commented 5 months ago

No I haven't, but I'm not sure I understand what you mean by that.

hulianyuyy commented 5 months ago

The BOBSL dataset, supports three tasks including sign recognition, Sign language sentence alignment and sign language translation, but not including continuous sign language recognition. Do you use the data for sign language translation to train CorrNet for the continuous sign language recognition task?

mochaminte commented 5 months ago

No, I didn't use that data. I downloaded the videos and used the manually aligned subtitles to get sentence timing and the labels. I then extracted the frames for each sentence from the videos at 25fps at 256x256. Afterwards I followed similar steps as you described in your README file to set up the BOBSL dataset

hulianyuyy commented 5 months ago

It's worth noting that the subtitles is actually the translated text, which should be used for the sign language translation task instead of continuous sign language recognition task.

mochaminte commented 5 months ago

I understand now, so it is not possible to use bobsl for continuous sign language recognition then ?

hulianyuyy commented 5 months ago

Yes, in my view, it can't be used for continuous sign language recognition now.

mochaminte commented 5 months ago

okay i understand, thank you for your help