Open userLx888 opened 1 year ago
whicih pkl files?
datasets/stage_two/swin_frame_pred_output/statistic_train_data.pkl in train_net.py
this is not a standard dataset used for benchmark, if you need, I will update it later
Thanks a lot.
@liaorongfan @userLx888 Can you please describe how to use pkl files ? how to load annotations ? Thank you
ok, Within about a month, the pkl data will be updated, currently paper revision exps are undergoing
When testing, for each image there will be a label(5 traits) for it, the predicts of all images(frames) in a video and their corresponding labels rander the acc mse info.
At 2023-03-22 18:54:19, "oayoub.dev" @.***> wrote:
@liaorongfan Hi, Can you please clarify that for me.
featues: images labels : [0.6333333 0.5145631 0.6168224 0.51648355 0.4375 ] # OCEAN ....
and you are giving that to a CNN model (right?) after training and validation you got the final score [MSE, ACC] (right?)
Can you please tell how you do to calculate (MSE) for each trait (OCEAN) ? Thank you
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
May I ask about animal, ghost, lego, talk_seesion in true_personality. What are the differences and connections between the four categories of session?
For personality detection, indeed, I think there is no difference
I have just started this field and may not be familiar with many things. This code is very good and comprehensive, and I think it can be of great help to me. However, I still have many questions to ask you if I can obtain your WeChat. We can communicate directly on it. Thank you very much!
I am very glad to know that you are interested in this repo. Currently, this repo is under active development and will be upgraded in the near future, so I think it's OK for us to discuss it here
OKOK!Thanks a lot.Due to the large amount of code, I encountered difficulties in extracting it. May I ask which parts of the code I would need to extract from this series of code as a basis for adding my own things and completing a complete project.
How about running an example exp and using debug mode to follow the procedure, and then extracting the related modules. The repo is roughly organized as build-from-config, and the default config file may help you with the code running process.
OK!I get it!Thanks a lot.
Hello, I am reproducing '15multi modal_ pred.yaml'. May I ask what the input for this method is? Is the pkl of the import file the one that comes with the ChaLearn2017 dataset, or do we need to generate it ourselves. thank you!
![Uploading image.png…]()
Hello @userLx888
do we need to generate it ourselves
yes, we need to generate it by ourselves. The can be found in that script.
Hello, I am replicating 04 in config Crnet.yaml, may I ask if the order warnings of the optimizer and scheduler can be ignored. And also is the 'DeepPersonalitymain/dpcv/exps first stage/04 cr_ audiovisual_network.py' file not functioning in this project.It seems that I didn't use it when I was debugging.
Hi,
"the order warnings of the optimizer and scheduler can be ignored", yes, you can just ignored it DeepPersonalitymain/dpcv/exps first stage/04 cr audiovisual_network.py I use config file to set up the training, you may have a try at "python tools/run_exp.py -c config/xxx/xxx_crnet.yaml"
Hi,
"the order warnings of the optimizer and scheduler can be ignored",
yes, you can just ignored it
DeepPersonalitymain/dpcv/exps first stage/04 cr audiovisual_network.py
I use config file to set up the training, you may have a try at "python tools/run_exp.py -c config/xxx/xxx_crnet.yaml"
I am trying to understand the text transcription mode of learning crnet network. Can you provide me with the source code of that paper on crnet? Thank you!
Hi, I am sorry to tell that this benchmark doesn't involve text modal for personality recognition since this benchmark focusing on audio-visual clues for personality recognition.
May I also ask if the network used for facial extraction in our experiment is MTCNN or something else?
Hello, I would like to ask how it is implemented to extract 32 frames of each input video as input in the code reproduced by CR net. I am sorry that I cannot see here that 32 frames were extracted as input.
Hi @userLx888, glad to know that you are using the code. From my understanding, the visual model processes one image at a time, instead of processing the whole 32 frame at once. As you can see from the code, one video was downsampled into 32 frames and then a single frame was selected for model input. However, if you want to take 32 frames as input at one time, you can utilize the batch dimension to organize the input in the shape of (32, 3, 244, 244). While to my view, the 32 images will be computed in parallel and then the temporal info among the 32 images will not be captured.
Thank you!One video was downsampled into 32 frames, is it controlled through the sample_size variable?The sample_size in the source code is set to 100. Do I want to change it to 32 here.
yes, I think so.
Hello, I always experience overfitting when reproducing the cr-net program. Do you have any good methods to solve it.
Hello, is there no subsequent ETR regression process replicated in the CR-net network?
Thank you!
@userLx888 Hi, for overfitting, generally, drop mechanism can be used. And there is one for your reference https://arxiv.org/pdf/2004.04725.pdf
OK,thank you.
Hello, I would like to delve deeper into some of the experimental details of cr-net. Do you have the source code for that article. I cannot contact the author. If you can provide assistance, thank you very much.
@userLx888 Sorry, I don't have an access to the source code from the author.
When I train crnet, I always encounter the problem of loss not decreasing. Have you ever encountered such problems or have you found any good solutions.
I used the facial cropping and alignment script provided by the code to extract facial frames from the video with poor performance, and many frames cannot be extracted. I would like to ask if there is any facial data that has already been extracted and would like to download it directly. The dataset is ChaLearn2016/2017. Thank you.
This is the problematic result.
Can you tell me which video the images belong to? It seems not the case to me, but please let me check. I don't konw where to put the data, they are about 80G, I guess
之前的我手动删了好多,有很多有问题的,比如验证集的4lIbWq27O84.005。数据集不知道可否用google云盘或者百度网盘来上传。 还有一个问题,我训练的损失一直不下降,但是准确率确实是在一点点提升,不知道您有没有遇到过这种问题。
This code comes from CR_ Net's data selection reflects how to extract 32 frames from a video. Currently, it seems that a video is divided into 32 segments, and only one random frame is extracted and sent to the network, without achieving the goal of extracting 32 frames
@userLx888 As for the dataset, I'm considering to upload it onto Google cloud for the convience of researchers.
As for this piece of code, I think we've talk about it before, please referce this message
Okay, thank you for your answer. If I don't define this number as 32 and randomly select one frame from the entire paragraph, will the effect be the same? Does this 32 mean anything here
Is the datasets still being updated? Do we need to prepare the pkl files of some datasets by ourselves?Thank you.