hringbauer / ancIBD

Detecting IBD within low coverage ancient DNA data. Development Repository for software package that contains code for manuscript.
GNU General Public License v3.0
10 stars 3 forks source link

--mask option bug #19

Open hughmccoll opened 3 months ago

hughmccoll commented 3 months ago

Dear Harald and Yilei

I have been recently running some tests with ancIBD (v0.7) and from what I can see the mask (--mask) option does not appear to have any impact on the results.

I’m using the mask file provided in the vignette (./map/mask.track). From what I can see from the code, adding a header to the mask file is required (ch start_bp end_bp start_cm end_cm) and from the log file it appears to be working, but the output is identical whether the command is included or not.

I’ve put the commands and a subset of the results below for a test I ran on chromosome 18 for two individuals. The full log files and results are attached.

There is an example of a segment (6416465 BP to 8096608 BP) that I believe should be masked out but had not been.

Hopefully I haven’t missed something obvious on my end!

Best Hugh

ancIBD command with mask

ancIBD-run \
--vcf 18.RISE109_EAS004.vcf.gz \
--ch 18 \
--min 1 \
--marker_path ./ancibd_data/filters/snps_bcftools_ch18.csv \
--map_path ./ancibd_data/map/v51.1_1240k.snp \
--af_path ./ancibd_data/afs/v51.1_1240k_AF_ch18.tsv \
--prefix RISE109_EAS004.mask \
--mask ./ancibd_data/map/mask.track_whead \
--out ./mask | tee -a "log.mask.txt"

ancIBD command without mask

ancIBD-run  \
--vcf 18.RISE109_EAS004.vcf.gz \
--ch 18 \
--min 1 \
--marker_path ./ancibd_data/filters/snps_bcftools_ch18.csv  \
--map_path ./ancibd_data/map/v51.1_1240k.snp \
--af_path ./ancibd_data/afs/v51.1_1240k_AF_ch18.tsv \
--prefix RISE109_EAS004.no_mask \
--out ./no_mask | tee -a "log.no_mask.txt"

Mask file

ch  start_bp   end_bp     start_cm  end_cm
1   117262791  152752796  147.421   163.079
2   28328      10223894   0.014     26.182
4   71566      5690711    0.341     11.535
8   164984     14405249   0.0004    31.511
10  43410489   54652146   62.682    77.491
14  101952406  107283009  111.711   120.2
15  20071673   34014753   0.005     43.484
17  32762040   39154710   54.307    63.702
18  69836      8490599    0.1604    25.465
19  266034     10030630   0.002     29.543
19  47112648   59087479   74.533    107.7316
21  14601415   24499572   0.861     21.041
22  17054720   25615059   1.723     23.842

Section of log file

Applying mask to IBD segments...
    Start    End  ...                           iid1                           iid2
0    1395   1487  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
1    2767   3629  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
2    4231   4775  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
3    5224   5411  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
4    6278   6657  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
5    7539   7933  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
6    9410  10448  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
7   12803  13437  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
8   14306  15154  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
9   15622  15879  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
10  19670  19898  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
11  21006  21444  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
12  22409  22741  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
13  24215  24718  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
14  24890  25111  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
15  25824  26136  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
16  26646  27343  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
17  27723  28089  ...  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature

Results from run with mask:

Start  End    StartM               EndM                 length  lengthM               ch  StartBP   EndBP     iid1                           iid2
3110   4082   0.187756             0.240680992603302    972     0.0529249906539917    18  6416465   8096608   RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
4739   5354   0.27425798773765564  0.31047698855400085  615     0.036219000816345215  18  9017786   10296623  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
5872   6090   0.33414000272750854  0.34862199425697327  218     0.014481991529464722  18  11078646  11573400  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
7048   7497   0.3891560137271881   0.4127730131149292   449     0.02361699938774109   18  14000961  20106913  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
8477   8929   0.43661099672317505  0.45054900646209717  452     0.01393800973892212   18  22545495  23529365  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
10582  11853  0.4954560101032257   0.5268459916114807   1271    0.031389981508255005  18  28087698  31420010  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
14468  15122  0.5745900273323059   0.5875930190086365   654     0.013002991676330566  18  38026603  39712385  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
16092  17042  0.6033920049667358   0.6263909935951233   950     0.02299898862838745   18  42434615  44150558  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
17575  17853  0.6380820274353027   0.6519550085067749   278     0.013872981071472168  18  45358666  45874561  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
21995  22245  0.743274986743927    0.7536090016365051   250     0.010334014892578125  18  55127493  55535280  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
25036  25454  0.8473230004310608   0.8599590063095093   418     0.012636005878448486  18  60957339  61802304  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
27042  28042  0.9102759957313538   0.9489629864692688   1000    0.03868699073791504   18  65887287  67938899  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
28832  29192  0.977187991142273    0.9897369742393494   360     0.012548983097076416  18  69700748  70489780  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
29745  30530  1.0176960229873657   1.0555000305175781   785     0.0378040075302124    18  71478627  72915939  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature
30970  31374  1.0719029903411865   1.0926170349121094   404     0.02071404457092285   18  73615868  74318960  RISE109.allentoft_2015_nature  EAS004.gretzinger_2022_nature

RISE109_EAS004.mask.ch18.txt RISE109_EAS004.no_mask.ch18.txt log.mask.txt log.no_mask.txt