mkanai / slalom

SLALOM (suspicious loci analysis of meta-analysis summary statistics)
MIT License
7 stars 5 forks source link

Using SLALOM With Custom LD Panel #2

Closed floutt closed 1 year ago

floutt commented 1 year ago

Hello!

Based on the documentation of this software I am aware that it is possible to run this QC pipeline with a user defined LD reference. Looking through the code, it looks like this argument requires that it be saved as a Hail BlockMatrix object. However, looking further into the code, it looks as if, in addition to this, the software requires additional custom-ld-variant-index-path and custom-ld-label parameters. The nature of these are a bit more vague.

I have a few questions regarding all of this

1) I would like to build a sparse BlockMatrix object from our LD panel, is this software compatible with both sparse and dense BlockMatrix objects? 2) What would be a good way of building a sparse block matrix from a key-value R matrix pairing? I'd rather not build a dense matrix and sparsify it due to memory restrictions. 3) what is the nature of the custom-ld-variant-index-path and custom-ld-label parameters? How would one go about building them given an LD BlockMatrix?

Cheers,

Tosin

mkanai commented 1 year ago

Hi Tosin,

Thanks so much for your inquiry. In terms of the sparsity of Hail's BlockMatrix, please refer to the documentation here.

Briefly,

  1. Yes, our Hail BlockMatrix is indeed sparse via BlockMatrix.sparsify_row_intervals (window around variants) and BlockMatrix.sparsify_triangle (only upper triangular matrix is kept).
  2. Please note that Hail BlockMatrix is Hail-specific format and has limited interface with existing data formats. Although there are BlockMatrix.from_numpy (in-memory numpy object) and BlockMatrix.fromfile (a binary file), I'd recommend recomputing a LD matrix in Hail. If you have a vcf file, you can hl.import_vcf to make Hail MatrixTable, convert it to Hail BlockMatrix, and do linear algebra to compute LD.
  3. Apologies for the limited documentation for these parameters -- they are originally intended for internal use. custom-ld-variant-index-path represents a path to a Hail Table that records indices of variants in Hail BlockMatrix (required fields are shown below). custom-ld-label is just an output label for the output. For example, if you specify custom, the output contains a column custom_lead_r[2]
----------------------------------------
Global fields:
    None
----------------------------------------
Row fields:
    'locus': locus<GRCh38> 
    'alleles': array<str> 
    'idx': int64 
----------------------------------------
Key: ['locus', 'alleles']
----------------------------------------

Hope this helps!

Best, Masa

floutt commented 1 year ago

Yes, thank you! This has been pretty helpful. I'll let you know if I have any more questions

floutt commented 1 year ago

I was able to run the software successfully with some modifications. This issue can be closed now. Thank you!