shz9 / magenpy

Modeling and Analysis of (Statistical) Genetics data in python
https://shz9.github.io/magenpy/
MIT License
16 stars 5 forks source link

Add LD matrices to S3/Google Cloud storage services #10

Open shz9 opened 2 years ago

shz9 commented 2 years ago

Instead of asking the user to download the LD matrices every time they need to run, e.g. viprs, we can leverage Zarr's APIs for cloud storage and read the matrices from a central repository on, e.g. Amazon s3 or Google Cloud. For this to happen, we need to extend the LDMatrix class to handle various distributed storage systems, with the help of the MutableMapping interface from Dask (see here).

A simple way to go about this is to add methods to the LDMatrix class, such as .from_s3() and .from_gc() to read matrices from the s3 and Google Cloud storage systems, respectively.

biostatShao commented 2 months ago

Dear author,

Firstly, I would like to express my support for your proposed enhancement. As a user of viprs, I believe that this improvement will significantly enhance the usability and efficiency of viprs.

However, I am currently encountering an issue where using shrinkage as the LD estimator results in very low prediction performance with viprs. In contrast, using the LD data you provided before the update restores normal performance. Therefore, I would like to ask if you could provide a new download link or expedite the completion of cloud storage services. If possible, could you also provide LD data using shrinkage for the 1000G European population as well as LD data from different ancestry groups in the UK Biobank, so that I can better utilize viprs?

Thank you very much for your contributions to the viprs project. I am looking forward to your response and assistance. Thank you!

Sincerely, Zhonghe

shz9 commented 2 months ago

Hi Zhonghe,

Thanks! We're working on adding this feature as part of a new release of the viprs software. We will also be releasing LD matrices for 6 continental populations represented in the UK Biobank in the next couple of weeks.

Do you mind opening a separate issue (here or under viprs) about the issues you're having with the shrinkage estimator? Which versions of the magenpy/viprs are you using? Which data did you compute the LD matrix from? How did you go about running viprs? All of these details can help us improve the software.

Thanks,

Shadi

biostatShao commented 1 week ago

Dear Shadi,

I apologize for the delayed response. I have created a new issue, as requested, with the aim of contributing to the ongoing improvement of the software. In addition, I look forward to the availability of a pre-calculated LD matrix on your website. Thank you sincerely for your attention to this matter.

Best regards,

Zhonghe