Open YOUKAINOYAMA opened 10 months ago
Hi, I'm facing the same problem with you. Did you solve it? If yes, could you share it with me, because I also really want to experience with this work. Thank you so much for your help.
I am facing the same issue as well and also have access to ADNI. Would anyone be able to help please or point out any mistakes I have made (if any)? The following are my issues:
Genetic Data
I had to add the line if vcf_file.endswith(".gz"):
inside the for loop for vcf_file in files:
of the python script filter_vcfs.py to prevent .vcf.gz.tbi files from being processed as errors were returned.
For filter_vcfs.py, it seems that only .pkl files and "log.txt" will be generated, however, after iterating through all the files, that is, the ADNI WGS (GATK) data, not a single .pkl file was generated. Therefore, the only file output was log.txt containing which contain boolean values (nearly if not all are False). Issue: No pickle files generated, therefore unable to feed this data into downstream code concat_vcfs.py
I am struggling to find the labels for the genetic data used in the MADDI study i.e. for the python script concat_vcfs.py on line 12 diag = pd.read_csv("YOUR_PATH_TO_DIAGNOSIS_TABLE")
, I am unable to locate the diagnosis table. Issue: Unable to find diagnosis table on ADNI website
Additional issues faced during genetic data pre-processing For : ./ADNI.808_indiv.minGQ_21.pass.ADNI_ID.chr3.vcf.gz
CSV reading complete
vcf: <pandas.io.parsers.readers.TextFileReader object at 0x7fe95fb15790>
Traceback (most recent call last):
File "/home/user/Alzheimers/genetic_data/filter_vcfs.py", line 100, in
For the python script filter_vcfs.py on line 53, end = vcf_file.find("output.vcf")
, it seems this value will always produce -1 given that none of the vcf_files contain "output.vcf", was this intended?
Sorry, I'm not quite sure. I plan to conduct an experiment on this paper using the dataset I collected myself, and the author cannot disclose this medical dataset to us.
发送自 Windows 10 版邮件https://go.microsoft.com/fwlink/?LinkId=550986应用
发件人: @.> 发送时间: 2023年12月7日 17:32 收件人: @.> 抄送: @.>; @.> 主题: Re: [rsinghlab/MADDi] About Datasets (Issue #16)
For the python script filter_vcfs.py on line 53, end = vcf_file.find("output.vcf"), it seems this value will always produce -1 given that none of the vcf_files contain "output.vcf", was this intended?
― Reply to this email directly, view it on GitHubhttps://github.com/rsinghlab/MADDi/issues/16#issuecomment-1844995704, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AY4FWYLA4RVMLMZ34CIDIFLYIGELJAVCNFSM6AAAAAA6QMXHWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBUHE4TKNZQGQ. You are receiving this because you authored the thread.Message ID: @.***>
Hello, thank you for sharing the code. I would like to replicate your work, and I have obtained access permission for ADNI. However, I'm facing difficulties in selecting the data according to the descriptions in the paper. If possible, could you share the dataset you filtered from ADNI with me, or provide some guidance on how to select file names or table names on the official ADNI website? My email is sanshaveheart@outlook.com. Thanks again for your work.