rsinghlab / MADDi

This repository is for the Multimodal Alzheimer’s Disease Diagnosis framework (MADDi).
MIT License
82 stars 16 forks source link

Facing error due to missing of "index" column #7

Closed ManojKumar002 closed 1 year ago

ManojKumar002 commented 1 year ago

Hello, In the file "preprocess_genetic/concat_vcfs.py" at the line no 25 you are merging the two vcf files using the column named "index" and later renamed it as 'subject".But if we print the vcf file after 24th we will get something like this https://drive.google.com/file/d/1JGesHpv4QQje1aRVrf9pUBmuTW2yrZAU/view?usp=sharing , which doesnt have actual "subject id" value under the column "index".

We tried creating the subject column by putting the actual subject id but we are getting the errors in concatinating the multiple vcf files .

So can you please clarify on how the "subject" column should be considered and how exactly we can club multiple vcf files of different subject to sa ingle file, as they dont have common column name.

michalg04 commented 1 year ago

Hi there,

Thanks for reaching out! Based on the screenshot, I suspect that either ADNI has changed the formatting of their subject IDs since the time I have worked with the data, or you might have downloaded a different set of patients than the ones I had. I think that because the IDs do not look similar in format to what I had before. You have the right idea that index and subject should both be the actual subject ID so my best suggestion would be to further print and investigate why the column disappeared, as that was not the case with the data I had in 2021.