Open WenjianBI opened 4 years ago
That’s all very helpful suggestions. Thanks and I will implement those.
Sent from my iPhone
On Jul 28, 2020, at 3:56 PM, Wenjian Bi notifications@github.com wrote:
Hi Xiaowei,
I am using seqminer (v8.0) and it works pretty well under multiple OS. I am wondering if you can add some features to the current functions.
Usually, we do not need all subjects in analysis. So, for readBGENToMatrixByRange() and readVCFToMatrixByRange(), can you add one more argument such as 'subjIDs' or 'subjIndex' to specify the subjects in analysis. That can save a lot of memory sometimes.
Can you add one more function to split all markers into multiple ranges, and each range includes similar number of markers. When conducting a genome-wide analysis, we cannot put the genotype of all markers into memory. Hence, this function can greatly help us for that purpose. If possible, I suggest the new function should be like splitRange(fileName, memoryChunk = 4GB, subjIDs, ...). Output can be a data.frame object in which each row is for one range.
Sometimes, the plink bed/bim/fam files or bgen bgen/bgi files have different prefix names. I am wondering if you can let users specify the different names for different files. That would be also helpful.
Thanks, Wenjian
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Thank you for the swift reply. Bgen files are becoming more and more popular and I think your package can be a very important tool for R users.
I think it'd be great if there is an option load a matrix from readBGENToMatrixByRange indexed by rsid instead of position.
Thanks for the suggestion, but managing rsid is quite challenging as they can change over time (rs ids can merge or becomes invalid across releases).
Sent from my iPhone
On Feb 10, 2021, at 5:44 PM, Peiyuan Zhu notifications@github.com wrote:
I think it'd be great if there is an option load a matrix from readBGENToMatrixByRange indexed by rsid instead of position.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
Missing data imputation can be an important feature to have. I wonder how missing genotype is handled in the current version.
If a genotype is missing in BGEN, you will get NA as the genotype.
Best, Xiaowei
On Thu, Feb 11, 2021 at 8:07 PM Peiyuan Zhu notifications@github.com wrote:
Missing data imputation can be an important feature to have. I wonder how missing genotype is handled in the current version.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zhanxw/seqminer/issues/12#issuecomment-777921113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGRCHLX7WLVFL77LZ6ZMDS6SEPFANCNFSM4PK5HAWQ .
Hi Xiaowei,
I am using seqminer (v8.0) and it works pretty well under multiple OS. I am wondering if you can add some features to the current functions.
Usually, we do not need all subjects in analysis. So, for readBGENToMatrixByRange() and readVCFToMatrixByRange(), can you add one more argument such as 'subjIDs' or 'subjIndex' to specify the subjects in analysis. That can save a lot of memory sometimes.
Can you add one more function to split all markers into multiple ranges, and each range includes similar number of markers. When conducting a genome-wide analysis, we cannot put the genotype of all markers into memory. Hence, this function can greatly help us for that purpose. If possible, I suggest the new function should be like splitRange(fileName, memoryChunk = 4GB, subjIDs, ...). Output can be a data.frame object in which each row is for one range.
Sometimes, the plink bed/bim/fam files or bgen bgen/bgi files have different prefix names. I am wondering if you can let users specify the different names for different files. That would be also helpful.
Thanks, Wenjian