rwdavies / QUILT

GNU General Public License v3.0
45 stars 10 forks source link

Error in generating HLA reference panel #9

Closed danielanach closed 2 years ago

danielanach commented 2 years ago

Hi there,

I installed QUILT using the "github" installation instructions, and used the most recent release (1.0.1). The installation ran smoothly and all the all the tests passed.

For the analysis I am running, I needed a slightly different version of IGMT HLA alleles database (3.43 instead of the 3.39 in the pre-made reference material), so I followed the instructions provided here to generate new reference material: https://github.com/rwdavies/QUILT/blob/master/example/QUILT_hla_reference_panel_construction.Md

To make sure it ran as expected I just used the exact instructions listed, including using the 3.39 IGMT reference material to start.

However I keep running into an error in the make_and_save_hla_full_alleles_filled_in() function.

I re-downloaded and checked all the downloaded files, but am a bit stumped.

Attached is the error message :

[2021-12-10 12:57:01] Running QUILT_HLA_prepare_reference(outputdir = /cellar/users/dnachman/programs/QUILT2/references, 
nGen = 100, hla_gene_region_file = /cellar/users/dnachman/programs/QUILT2/QUILT/hla_ancillary_files/hlagenes.txt,
hla_types_panel = /cellar/users/dnachman/programs/QUILT2/QUILT_HLA/20181129_HLA_types_full_1000_Genomes_Project_panel.txt,
ipd_igmt_alignments_zip_file = /cellar/users/dnachman/programs/QUILT2/QUILT_HLA/Alignments_Rel_3390.zip,
quilt_hla_supplementary_info_file=/cellar/users/dnachman/programs/QUILT2/QUILT/hla_ancillary_files/quilt_hla_supplementary_info.txt,
full_regionStart = 25587319, full_regionEnd = 33629686, buffer = 500000, 
region_exclude_file = /cellar/users/dnachman/programs/QUILT2/QUILT/hla_ancillary_files/hlagenes.txt, 
genetic_map_file = /cellar/users/dnachman/programs/QUILT2/QUILT_HLA/CEU/CEU-chr6-final.b38.txt.gz,
reference_haplotype_file = /cellar/users/dnachman/programs/QUILT2/QUILT_HLA/oneKG.hap.gz, 
reference_legend_file = /cellar/users/dnachman/programs/QUILT2/QUILT_HLA/oneKG.legend.gz, 
reference_sample_file = /cellar/users/dnachman/programs/QUILT2/QUILT_HLA/oneKG.samples,
reference_exclude_samplelist_file = /cellar/users/dnachman/programs/QUILT2/QUILT_HLA/exclude_ref_samples_for_testing.txt,
reference_exclude_samples_for_initial_phasing = FALSE, 
all_hla_regions = c("A", "B", "C", "DMA", "DMB", "DOA", "DOB", "DPA1", "DPA2", "DPB1", "DPB2", "DQA1", "DQA2", "DQB1", "DRA", "DRB1", "DRB3", "DRB4", "DRB5", "E", "F", "G", "HFE", "H", "J", "K", "L", "MICA", "MICB", "N", "P", "S", "TAP1", "TAP2", "T", "U", "V", "W", "Y"),
hla_regions_to_prepare = c("A", "B", "C", "DQB1", "DRB1"), chr = chr6, minRate = 0.1, nCores = 1)
sending incremental file list

sent 81 bytes  received 12 bytes  186.00 bytes/sec
total size is 5,691,094  speedup is 61,194.56
Archive:  Alignments_Rel_3390.zip
[2021-12-10 12:57:01] Begin making HLA all alleles kmers file
[2021-12-10 12:57:01] Processing file 1 out of 39
[2021-12-10 12:57:50] Processing file 4 out of 39
[2021-12-10 12:57:54] Processing file 8 out of 39
[2021-12-10 12:58:48] Processing file 12 out of 39
[2021-12-10 12:59:13] Processing file 16 out of 39
[2021-12-10 13:02:54] Processing file 20 out of 39
[2021-12-10 13:03:00] Processing file 24 out of 39
[2021-12-10 13:03:05] Processing file 28 out of 39
[2021-12-10 13:05:12] Processing file 32 out of 39
[2021-12-10 13:05:20] Processing file 36 out of 39
[2021-12-10 13:05:58] Done making HLA all alleles kmers file
[2021-12-10 13:05:58] Begin making HLA snp format alleles file
[2021-12-10 13:05:58] Working on region:A
[2021-12-10 13:07:18] Working on region:B
[2021-12-10 13:12:12] Working on region:C
[2021-12-10 13:16:18] Working on region:DQB1
[2021-12-10 13:16:45] Working on region:DRB1
[2021-12-10 13:18:52] Done making HLA snp format alleles file
[2021-12-10 13:18:52] Begin making HLA full alleles filled in file
[2021-12-10 13:18:52] Processing HLA-A
[2021-12-10 13:21:42] Processing HLA-B
Error in colSums(allelemat[1:5, ]) :
  'x' must be an array of at least two dimensions
Calls: QUILT_HLA_prepare_reference ... make_and_save_hla_full_alleles_filled_in -> colSums
Execution halted`

Let me know if I am missing something, I will keep de-bugging but thought I would post this in case others also encounter the same problem. Thanks!

rwdavies commented 2 years ago

Hi,

Sorry for my slow reply, my daughter is off sick this week from nursery, so I have limited time to work.

Thanks for the message, I'll investigate. I'm going to try re-running on my machine to see if I get the same error with the latest version of the repo. If not I'll take a deeper look into the above error log. I'll also try with the newer IGMT file as well to see if that works on my machine without issue.

Thanks Robbie

danielanach commented 2 years ago

Hi Robbie,

No worries at all, apologies for adding to your plate, hope your daughter gets well soon!

Best, Daniela

rwdavies commented 2 years ago

So the current master repo version worked for me on clean directories, can you please re-try the following (if you're on a Linux OS). I can't quite remember but the tools to build the reference panel might have check points that get thrown off if you re-run changing some of the files partway through

1) Modify the following paths to new directories on your local environment in the file example/QUILT_hla_reference_panel_construction.Md

inputs_dir=/data/smew1/rdavies/quilt_hla_2021_12_15/
test_dir=/data/smew1/rdavies/quilt_hla_2021_12_15/
reference_package_dir=/data/smew1/rdavies/quilt_hla_reference_panel_files_2021_12_15/

2) Run

bash example/run_example.sh example/QUILT_hla_reference_panel_construction.Md 

Here's the log I get when I do that on my machine log_hla_2021_12_15.txt

danielanach commented 2 years ago

Thank you for the suggestion, starting from scratch resolved the error when working with the 3.39 HLA versions and I was able to successfully construct a reference panel. Indeed I must have re-run with the same directories after the initial failed attempt and that must have corrupted something.

I did have to change a couple more lines in the file example/QUILT_hla_reference_panel_construction.Md that relate to the location of the QUILT installation:

R -f ~/proj/QUILT/scripts/make_b38_recomb_map.R --args ${inputs_dir} CEU 6

cd ~/proj/QUILT/ ## change to the directory where you've cloned QUILT

cd ~/proj/QUILT/

After these changes my log looked identical to yours and QUILT ran successfully on the test BAM files with the same predicted alleles.

However, when I start with clean directories and swap out the alignment file with a new version Alignments_Rel_3430.zip

I am still getting the same error in the reference construction on HLA-B as the original issue log. Not sure why yet, but at least now I can produce one successful reference package with the older HLA version!

rwdavies commented 2 years ago

OK thanks, great. I had some weird issues downloading data last week when running the script but looks OK now, I'll re-start and hope it captures the same error that I can then debug.

danielanach commented 2 years ago

It looks like the version 1.0.2 that you released resolves my issue, I was able to run on 3.43 without problems so I will close this issue. Thank you!