extracting markers - Githubissues

Hi Wittney,

Yes you can do that in GUSMap. I'll use the manuka data that comes with the package as a example.

> ## extract filename for Manuka dataset in GUSMap package
> vcffile <- Manuka11()
> 
> ## Convert VCF to RA format
> rafile <- VCFtoRA(vcffile$vcf)
Processing VCF file: Converting to RA format.

Found 181 samples
680 SNPs written

Name of RA file:             Manuka_chr11.vcf.ra.tab 
Location of RA file:        E:/Potato/Hetero/

> ## read in the RA data
> mkdata <- readRA(rafile)
Read 183 items
Read 680 records
> ## Create the FS population
> manuka = makeFS(mkdata, pedfile=vcffile$ped)
-------------
Processing Data.

Filtering criteria for removing SNPs:
Minor allele frequency (MAF) < 0.05
Percentage of missing genotypes > 20%
Distance for binning SNPs <= 100 bp
Read depth associated with at least one parental genotype <= 5
P-value for segregation test < 0.01
Average SNP depth is > 500

Processing Family Manuka.

-------------
Summary:

Number of SNPs remaining after filtering: 680 
Both-informative (BI): 116 
Maternal-informative (MI): 294 
Paternal-informative (PI): 270 
Number of progeny: 177 
Single Family Linkage analysis:

Data Summary:
Data file:  E:/Potato/Hetero/Manuka_chr11.vcf.ra.tab
Mean Depth: 20.8542
Mean Call Rate: 0.9765
Number of ...
  Progeny:  177
  MI SNPs:  294
  PI SNPs:  270
  BI SNPs:  116
  Total SNPs:   680

The code above is reading the manuka data into GUSMap and creating the FS object.

Now to work out the names of the MI SNPs and PI SNPs, first the chromosome and position information from the FS object needs to be extracted which is done using the $extractVar function. The required code is following

> mapdata = manuka$extractVar(c("chrom","pos","group"))
> ## output of mapdata
> str(mapdata)
List of 3
 $ chrom: chr [1:680] "11" "11" "11" "11" ...
 $ pos  : num [1:680] 331784 354083 382652 391004 393972 ...
 $ group:List of 3
  ..$ BI: int [1:116] 2 34 36 60 67 78 79 80 103 110 ...
  ..$ PI: int [1:270] 10 12 15 20 21 24 25 27 30 32 ...
  ..$ MI: int [1:294] 1 3 4 5 6 7 8 9 11 13 ...

This outputs a list where the first element gives the chromosome number (or whatever was in the CHROM field in the VCF file) for each SNP, the second element gives the positions (or whatever was in the POS field in the VCF file), and the third element is a list (of length 3) as shown with the following code,

> str(mapdata$group)
List of 3
 $ BI: int [1:116] 2 34 36 60 67 78 79 80 103 110 ...
 $ PI: int [1:270] 10 12 15 20 21 24 25 27 30 32 ...
 $ MI: int [1:294] 1 3 4 5 6 7 8 9 11 13 ...

The first element of mapdata$group gives the indices of the SNPs that are BI, the second element gives the indices of the PI SNPs, and the third element gives the indices of the MI SNPs. The chromosome number of all the PI SNPs can be obtained using

> mapdata$chrom[mapdata$group$PI]
  [1] "11" "11" "11" "11" "11" "11" "11" ...

(which in this case is just "11" since all the SNPs were mapped to chromosome 11) and the positions of the PI SNPs can be obtained using

> mapdata$pos[mapdata$group$PI]
  [1]   413453   415939   439362  1006308  1154431  ...

In like manner, you can obtain the chromosome and position information for the MI SNPs by using

> mapdata$chrom[mapdata$group$MI]
> mapdata$pos[mapdata$group$MI]

and similarly you can do the same for the BI SNPs.

Hope that makes sense. Let me know if anything is not clear.

Thanks, Timothy

tpbilton / GUSMap

extracting markers #19