omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
85 stars 21 forks source link

Pre-computed LD matrix link #144

Closed karbalaei closed 1 year ago

karbalaei commented 1 year ago

Hi,

The link to the pre-computed LD matrixes for UK Biobank data doesn`t work, and it seems that the https://alkesgroup.broadinstitute.org/UKBB_LD is down. I am still determining where I should mention it, so I asked here. Could you please help me with this? Thanks

Karen-Silverline commented 1 year ago

I am having the same problem. The website seemed to shut down without warning. Hope it can be restored as soon as possible.

omerwe commented 1 year ago

Hi both, we're currently having some issues due to extremely large cloud computing costs. We're looking into this and will hopefully have a solution soon. Sorry for the inconvenience!

cgpu commented 1 year ago

Hi @omerwe, thank you for the update. Would you be able to recommend another alternative for retrieving LD data until the server is back up? Thanks in advance

omerwe commented 1 year ago

@cgpu I'm afraid we can't help you at the moment. We're actively working on finding a solution, and we'll post an update the minute we find one...

omerwe commented 1 year ago

I would actually like to use this thread to ask for community support. Our current cloud storage provider (the Broad Institute cloud) charges us prohibitively large amounts for user downloads of LD matrices. The costs approached $100K over the last few months.

If anyone has a suggested cloud storage provider that can store large datasets (>3TB) at reasonable costs, please let us know!

pettyalex commented 1 year ago

I would actually like to use this thread to ask for community support. Our current cloud storage provider (the Broad Institute cloud) charges us prohibitively large amounts for user downloads of LD matrices. The costs approached $100K over the last few months.

If anyone has a suggested cloud storage provider that can store large datasets (>3TB) at reasonable costs, please let us know!

Absolutely. Cloudflare R2 would cost $15/mo per TB of storage, and egress costs are free, although read operations are billed at about 40c / million.

https://www.cloudflare.com/products/r2/

You could also potentially negotiate a lower rate with them if you have a lot of data, you just said ">3TB" and didn't give specific sizes.

Alternatively, the lowest-cost solution that would meet community needs would probably be a single server on a 10Gbit or larger line, and doing some type of QoS to distribute bandwidth evenly among users. That would be much better than the current state where they are just unavailable, and should be dirt cheap.

pettyalex commented 1 year ago

What kind of transit are you doing anyway? $100k/mo at Google Cloud Standard Egress rates is about 2 petabytes per month, is that how much egress you're doing?

A single 10gbit line would make for grumpy users, but it would work. 50gbit would probably be pretty comfortable.

pettyalex commented 1 year ago

There's also BitTorrent as an option if you want to get something up right now.

omerwe commented 1 year ago

Thanks @pettyalex , we'll look into these options! Hopefully we'll have good news as early as next week.

omerwe commented 1 year ago

Just an update for everyone following that we have a new home the UKBB LD matrices in an S3 bucket, kindly provided to us by the AWS Open Data program!

For those interested, here's the home of the new LD matrices: s3://broad-alkesgroup-ukbb-ld/UKBB_LD/

Thanks to everyone for their suggestions. Everything should be back working now (I updated the code to use this as the default URL).

karbalaei commented 1 year ago

Hi @omerwe, Thanks for your follow-up. I know it is a crazy question, but I don't know how to use the amazon S3 bucket. I searched "UKBB_LD" on that website, but I didn`t find LD references. Could you please help me with that? Thank you so much.

omerwe commented 1 year ago

@karbalaei sorry I forgot to mention the address. It's s3://broad-alkesgroup-ukbb-ld/UKBB_LD/

I suggest you do some Googling on working with S3 buckets, it's not complex but it's a bit different than regular URLs...

Hisewetty commented 1 year ago

Just an update for everyone following that we have a new home the UKBB LD matrices in an S3 bucket, kindly provided to us by the AWS Open Data program!

For those interested, here's the home of the new LD matrices: s3://broad-alkesgroup-ukbb-ld/UKBB_LD/

Thanks to everyone for their suggestions. Everything should be back working now (I updated the code to use this as the default URL).

Hello,could you tell me where can I find the "code to use this as the default URL) "?

omerwe commented 1 year ago

https://github.com/omerwe/polyfun/blob/de889bde9ce3d46326fc5ea568b623450f349569/compute_ldscores_from_ld.py#L18

(using the url interface of public S3 addresses)