statgen / Minimac4

GNU General Public License v3.0
56 stars 18 forks source link

How to obtain the RS ID from Minimac4 results #34

Open xiangboyulan opened 4 years ago

xiangboyulan commented 4 years ago

Hi,

How to obtain the RS ID from Minimac4 results? Thanks a lot!

Best,

Bo

jonathonl commented 4 years ago

You need to add --rsid when running minimac4. The reference panel also needs to have RS IDs in the ID column.

xiangboyulan commented 4 years ago

Hi Jonathon, I added --rsid ON when running minimac4 and used the reference panel from your download website. But I can't find the RS IDs from M3VCF file. Could you give me some advice? Thanks a lot!

jonathonl commented 4 years ago

I'm assuming you are referring to the 1000 genomes panel. This panel does not have RS IDs. You should be able to get them from the 1000 genomes VCFs on the 1000 genomes FTP site.

xiangboyulan commented 4 years ago

I used the Minimac3 to Convert 1000 genomes panel VCF to M3VCF, I can not find the RS IDs, only obtain Chr:pos as SNP

jonathonl commented 4 years ago

If the RS IDs exist in the VCF but not the M3VCF, then I would suggest using https://github.com/Santy-8128/m3vcftools to compress to M3VCF. This tool will copy over the ID column. I don't know whether the VCFs on our site include RS IDs, but the VCFs on the 1000 genomes site do.

xiangboyulan commented 4 years ago

Hi, I used the --referenceEstimates OFF, but it still work for ON

steffenom commented 1 year ago

Hi all! First of all, thanks for publishing your code on github!

We have a similar problem with missing IDs in the imputation output. We use minimac3 v.2.0.1 with the --rsid option to convert our panel with custom IDs from vcf to m3vcf. The resulting file still contains the IDs. We then convert the m3vcf file to msav format with minimac4 --update-m3vcf and run the imputation, but the output is missing the IDs.

We checked the msav file with the sav export command from savvy and there where no IDs in it. Any idea why the IDs get lost when converting from m3vcf to msav? We tried passing the --rsid option to minimac4, but it had no effect (and it is marked as deprecated). If I understood the previous discussion correctly, the IDs should be passed on.

jonathonl commented 1 year ago

@steffenom, thanks for reporting this. The earlier conversation was regarding v4.0.x. You are using v4.1.x, and this feature was missing from the new version. I just pushed a fix to the master branch. Please try the latest from the master branch to generate a new msav file.

steffenom commented 1 year ago

Hi @jonathonl, I tried the newest version on the master branch and it worked! The IDs showed up as expected. Thanks for the quick fix!

Minor drawback is that now all variants have an ID. The ones that don't have an ID in the reference panel now have an ID given by CHR:POS, but that is not a problem for us. Might be unexpected for other users.

jonathonl commented 1 year ago

For the IDs with CHR:POS, are these variants that exist only in the target file (not in reference)? If using --all-typed-sites, IDs for such variants are carried over from the target VCF instead of the reference panel. If the variant exists in the reference panel and has a missing ID in the reference panel, then the ID for that variant should also be missing in the imputed results.

steffenom commented 1 year ago

No, for me all variants without an ID in the initial reference panel have the CHR:POS ID in the final output (without using --all-typed-sites). I think, these IDs are already create when creating the m3vcf-file from the reference panel with minimac3 and then minimac4 --update-m3vcf just takes them over.

jonathonl commented 1 year ago

I see. FYI, you can generate an msav directly from a VCF, BCF, or SAV file with minimac4 --compress-reference input.vcf.gz -o compressed_output.msav. This still needs to be documented in the --help and README.

steffenom commented 1 year ago

Thanks for the hint! I tried minimac4 --compress-reference and now the IDs are as expected.

Should the results of the imputation with a reference panel created with minimac4 --compress-reference be similar to results for the same panel created with minimac3 --processReference + minimac4 --update-m3vcf? Or is one preferred over the other in certain situations?

jonathonl commented 1 year ago

There may be a small difference with smaller reference panels (tens of thousands of samples). By default, minimac3 --processReference does parameter estimation and saves those parameters in the m3vcf. This parameter estimation will be less useful for larger panels.

buegelbeatz commented 1 year ago

minimac4 --compress-reference input.vcf.gz -o compressed_output.msav somehow kicked me out with:

minimac v4.1.0

Error: Cannot write empty block
Error: serializing final block failed

input.vcf.gz has 1052764 chromosome 20 variants (rows). The file has 915 columns, converting to m3vcf with minimac3 works.

The code line where I'm kicked out is: https://github.com/statgen/Minimac4/blob/99ce06cbb696ad2862e882005921f4fa01647eeb/src/unique_haplotype.cpp#L211

It also failed for 4, 14, 15 - all other chromosomes works.

jonathonl commented 1 year ago

Error: Cannot write empty block Error: serializing final block failed

@buegelbeatz , this should be fixed with https://github.com/statgen/Minimac4/commit/6f9f1404e1875f9e7773af493872c4d94efc105f

buegelbeatz commented 1 year ago

Error: Cannot write empty block Error: serializing final block failed

@buegelbeatz , this should be fixed with 6f9f140

Just tested it - works now! - Thanx for the quick fix.