Inquire a question - Githubissues

Ingram-lin commented 1 year ago

Hello, I would like to inquire. In the Diagnosis task, is the test_results_task1.csv file generated through previous work, or is it included in the dataset?

probstlukas commented 1 year ago

Hello, this file is included in the existing dataset of the ADReSSo 2021 challenge. However, it is just an empty csv file including the diagnosis-test IDs and an empty column Prediction which will then be filled with the output of the models, i.e. task1_LR.csv, task1_RF.csv, and task1_SVC.csv.

Ingram-lin commented 1 year ago

Hello, I appreciate your reply. I only have the training set of ADReSSo 2021 in hand. After manually partitioning it, I used a portion as the training set. Then, I replicated your work and obtained similar values for Accuracy, Precision, Recall, and F1. I must say, it's an impressive piece of work!
However, I still have three questions to consult with you. Firstly, I didn't generate the files task1_LR.csv, task1_RF.csv, and task1_SVC.csv. Is it because I don't have the file test_results_task1.csv? (I commented out all the related Python code for it. The second question: why did you choose to use GPT for feature extraction? What advantages does it have over BERT? The last question：I would like to further study GPT's embedding. Do you have any suggestions? Thank you once again for taking the time to respond to my questions.

probstlukas commented 1 year ago

Hello, if you are a member of DementiaBank, you can download the test dataset as well. Thanks for your praise -- you are welcome to contribute to the project via pull requests :).

Firstly, I didn't generate the files task1_LR.csv, task1_RF.csv, and task1_SVC.csv. Is it because I don't have the file test_results_task1.csv? (I commented out all the related Python code for it.

Yes, I use the empty test_results_task1.csv file as a template to then fill it with the predicted data. See config.py:44 empty_test_results_file = (data_dir / "diagnosis-test" / "diagnosis" / "test-dist" / "test_results_task1.csv").resolve()

The second question: why did you choose to use GPT for feature extraction? What advantages does it have over BERT?

Primarily, since I am a research assistant at an institute of my university and I am supposed to use GPT for AD detection. Also, GPT-3 is good for feature extraction because it is effective at zero-shot learning and encoding semantic knowledge to produce embeddings that perform well on classification tasks. For more information, see the paper I linked in the README in the references at the end.

The last question：I would like to further study GPT's embedding. Do you have any suggestions?

Since I've been working on this project, I've come across some good sources to learn more about embedding:

I hope this helps.

Ingram-lin commented 1 year ago

Thank you very much for your response and sharing. I also want to apologize for asking some inappropriate questions on this platform. Finally, thanks again for your assistance, and I wish you a great day and a good mood!

probstlukas / gpt3-dementia-detection

Inquire a question #2