gymreklab / STRDenovoTools

Toolkit for calling and analyzing de novo STR mutations
GNU General Public License v3.0
13 stars 4 forks source link

Error while running MonSTR #18

Open bibb opened 1 year ago

bibb commented 1 year ago

Hello,

I'm running MonsTR with the following options:

MonSTR \ --strvcf Family_681_209.gangSTR_filtered.CR80.vcf.gz \ --fam families.fam \ --max-num-alleles 100 \ --include-invariant \ --gangstr \ --require-all-children \ --output-all-loci \ --min-num-encl-child 1 \ --max-perc-encl-parent 0.05 \ --min-encl-match 0.9 \ --min-total-encl 10 \ --posterior-threshold 0.5 \ --default-prior -3 \ --out Family_681_209.MonSTR_analysis

and I get this error:

[MonSTR-2.0] ERROR: Required INFO field GRID not present in VCF

I noticed that the gangSTR-generated VCFs do indeed have the GRID field in the INFO column, but when I do the merge with mergeSTR they are not longer included, so mergeSTR removes the GRID info column. This is the mergeSTR command I used to merge the files.

mergeSTR \ --vcfs sample1.vcf.gz,sample2.vcf.gz,sample3.vcf.gz,sample4.vcf.gz,sample5.vcf.gz,sample6.vcf.gz \ --out Family_681_209.gangSTR_filtered \ --vcftype gangstr

Is this a mistake done mergeSTR? I'm using version 4.2.1 MonSTR only works if I use the option --naive, but I know that this are not the optimal results because they are not considering the likelihoods.

I'll appreciate your help.

B.

gymreklab commented 1 year ago

mergeSTR does not support merging the GRID field, since this field depends specifically on the set of alleles considered which can differ across different GangSTR runs. If you want to have the GRID field, you should be able to run GangSTR in multi-sample mode so you don't have to run mergeSTR to combine.

bibb commented 1 year ago

Thank you for the help. I ran the genotyping using all the bam files at once with gangSTR and then I was able to keep the GRID field and use MonSTR.

think-o commented 11 months ago

mergeSTR does not support merging the GRID field, since this field depends specifically on the set of alleles considered which can differ across different GangSTR runs. If you want to have the GRID field, you should be able to run GangSTR in multi-sample mode so you don't have to run mergeSTR to combine.

Hi I have got the same error as @bibb . But it is important that I do not use the multi-sample mode but later merge the individual VCFs, then use MonSTR. Any chance of doing that?

gymreklab commented 11 months ago

This should be possible, but as noted above you will have to use the --naive flag. There is no currently no way to merge the GRID field across different runs, since different sets of alleles are considered in each run.