hanchenphd / GMMAT

Generalized linear Mixed Model Association Tests
Other
36 stars 22 forks source link

Group file sorted differently in SMMAT vs. SMMAT.meta #46

Open g3png opened 2 years ago

g3png commented 2 years ago

Dear Han,

We are running a large meta-analysis and have collected intermediate files from several cohorts. I realised however that SMMAT.meta fails at the following check at specifically multiallelic sites, despite ensuring all cohorts use the same group file.

if(any(sort(tmp.scores$idx)!=tmp.scores$idx)) {
        cat("In some", meta.files.prefix[i], "score files, the order of group and variants is not the same as in the group-sorted group.file.\n")
        stop("Error: meta files possibly not generated using this group.file!")
        }

An example of where this fails (for a single cohort) is:

  group chr      pos ref alt    N missrate      altfreq      SCORE       VAR
1:  A1BG  19 58409184   C   T 1586        0 0.0022068096  0.1737194 6.9899839
2:  A1BG  19 58409184   C   G 1586        0 0.0003152585 -0.6138912 0.9923992
        PVAL idx                              file
1: 0.9476113 823 prefix.score.1
2: 0.5377377 822 prefix.score.1

In this case index 823 comes before 822 which causes the error. I am guessing this is because SMMAT did not initially order variants according to ALT alleles at multiallelic sites.

Is there any way around this?

Edit: I have just read about the issue here regarding SMMAT being designed for biallelics. Would love to know what you think anyway, and if there are (near) future plans to include multiallelic variants.

Thanks for your help in advance,

Grace

hanchenphd commented 2 years ago

Hi Grace,

Thank you for your interest in SMMAT! I have not seen this issue before, but I guess the problem was probably because this tri-allelic marker was ordered differently in the GDS file and the group definition file. In SMMAT (which uses the GDS file to generate meta-analysis files), the variants are sorted based on the variant.id. In SMMAT.meta, since we assume no access to individual GDS files, we could only sort them based on chr and pos. For tri-allelic markers with the same chr and pos, it is possible that the order is different in the GDS files (not necessarily alphabetical).

If that was the case, the easiest solution would be to use a group definition file with variants in the same order as your GDS files. For example, if your C/G is before C/T in your group definition file, but C/T is before C/G in the GDS files, you might be able to fix the problem by switching C/G and C/T in your group definition file, without having to ask each cohort to rerun. Please let me know if it does not work.

Best, Han

g3png commented 2 years ago

Thanks Han for the quick reply!

I expect it will be complicated if different cohorts have multiallelic variants ordered differently in their GDS files… but so far we only see this issue with one cohort. I will go with your suggestion and update you on how it goes.

Best wishes, Grace

On Tue, 19 Jul 2022 at 18:05, Han Chen @.***> wrote:

Hi Grace,

Thank you for your interest in SMMAT! I have not seen this issue before, but I guess the problem was probably because this tri-allelic marker was ordered differently in the GDS file and the group definition file. In SMMAT (which uses the GDS file to generate meta-analysis files), the variants are sorted based on the variant.id. In SMMAT.meta, since we assume no access to individual GDS files, we could only sort them based on chr and pos. For tri-allelic markers with the same chr and pos, it is possible that the order is different in the GDS files (not necessarily alphabetical).

If that was the case, the easiest solution would be to use a group definition file with variants in the same order as your GDS files. For example, if your C/G is before C/T in your group definition file, but C/T is before C/G in the GDS files, you might be able to fix the problem by switching C/G and C/T in your group definition file, without having to ask each cohort to rerun. Please let me know if it does not work.

Best, Han

— Reply to this email directly, view it on GitHub https://github.com/hanchenphd/GMMAT/issues/46#issuecomment-1189252825, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVGC5P5EOKEZK4CNTTMSZQ3VU3G5BANCNFSM54AMQUQQ . You are receiving this because you authored the thread.Message ID: @.***>

-- Best wishes, Grace

anh151 commented 1 month ago

Hello. I had the same issue when trying to combine 2 cohorts. I tried everything to get things in the right order and I feel like there is a bug in SMMAT.meta when it attempts to sort the groups. If we're assuming the order is set by the GDS, why would SMMAT sort alphabetically?

I tried alphabetical. I tried combine the variant positions, outputting a GDS then using that order. I tried running a fake dataset and using the outputted scores file. None of them worked. I ended up just dropping mulitiallelic positions.

Thanks, Andrew

hanchenphd commented 2 days ago

Hi Andrew,

Have you tried fixing the order in your group definition file (instead of the order in GDS) as I suggested above? If you could send me a small reproducible example, I am happy to take a look.

Thanks, Han