Closed Ingram-lin closed 1 year ago
Hello, this file is included in the existing dataset of the ADReSSo 2021 challenge. However, it is just an empty csv file including the diagnosis-test IDs and an empty column Prediction which will then be filled with the output of the models, i.e. task1_LR.csv, task1_RF.csv, and task1_SVC.csv.
Hello, I appreciate your reply. I only have the training set of ADReSSo 2021 in hand. After manually partitioning it, I used a portion as the training set. Then, I replicated your work and obtained similar values for Accuracy, Precision, Recall, and F1. I must say, it's an impressive piece of work!
However, I still have three questions to consult with you. Firstly, I didn't generate the files task1_LR.csv, task1_RF.csv, and task1_SVC.csv. Is it because I don't have the file test_results_task1.csv? (I commented out all the related Python code for it.
The second question: why did you choose to use GPT for feature extraction? What advantages does it have over BERT?
The last question:I would like to further study GPT's embedding. Do you have any suggestions?
Thank you once again for taking the time to respond to my questions.
Hello, if you are a member of DementiaBank, you can download the test dataset as well. Thanks for your praise -- you are welcome to contribute to the project via pull requests :).
Firstly, I didn't generate the files task1_LR.csv, task1_RF.csv, and task1_SVC.csv. Is it because I don't have the file test_results_task1.csv? (I commented out all the related Python code for it.
Yes, I use the empty test_results_task1.csv file as a template to then fill it with the predicted data. See config.py:44
empty_test_results_file = (data_dir / "diagnosis-test" / "diagnosis" / "test-dist" / "test_results_task1.csv").resolve()
The second question: why did you choose to use GPT for feature extraction? What advantages does it have over BERT?
Primarily, since I am a research assistant at an institute of my university and I am supposed to use GPT for AD detection. Also, GPT-3 is good for feature extraction because it is effective at zero-shot learning and encoding semantic knowledge to produce embeddings that perform well on classification tasks. For more information, see the paper I linked in the README in the references at the end.
The last question:I would like to further study GPT's embedding. Do you have any suggestions?
Since I've been working on this project, I've come across some good sources to learn more about embedding:
I hope this helps.
Thank you very much for your response and sharing. I also want to apologize for asking some inappropriate questions on this platform. Finally, thanks again for your assistance, and I wish you a great day and a good mood!
Hello, I would like to inquire. In the Diagnosis task, is the test_results_task1.csv file generated through previous work, or is it included in the dataset?