Closed wittney closed 4 years ago
Hi Wittney,
Yes you can do that in GUSMap. I'll use the manuka data that comes with the package as a example.
> ## extract filename for Manuka dataset in GUSMap package
> vcffile <- Manuka11()
>
> ## Convert VCF to RA format
> rafile <- VCFtoRA(vcffile$vcf)
Processing VCF file: Converting to RA format.
Found 181 samples
680 SNPs written
Name of RA file: Manuka_chr11.vcf.ra.tab
Location of RA file: E:/Potato/Hetero/
> ## read in the RA data
> mkdata <- readRA(rafile)
Read 183 items
Read 680 records
> ## Create the FS population
> manuka = makeFS(mkdata, pedfile=vcffile$ped)
-------------
Processing Data.
Filtering criteria for removing SNPs:
Minor allele frequency (MAF) < 0.05
Percentage of missing genotypes > 20%
Distance for binning SNPs <= 100 bp
Read depth associated with at least one parental genotype <= 5
P-value for segregation test < 0.01
Average SNP depth is > 500
Processing Family Manuka.
-------------
Summary:
Number of SNPs remaining after filtering: 680
Both-informative (BI): 116
Maternal-informative (MI): 294
Paternal-informative (PI): 270
Number of progeny: 177
Single Family Linkage analysis:
Data Summary:
Data file: E:/Potato/Hetero/Manuka_chr11.vcf.ra.tab
Mean Depth: 20.8542
Mean Call Rate: 0.9765
Number of ...
Progeny: 177
MI SNPs: 294
PI SNPs: 270
BI SNPs: 116
Total SNPs: 680
The code above is reading the manuka data into GUSMap and creating the FS object.
Now to work out the names of the MI SNPs and PI SNPs, first the chromosome and position information from the FS object needs to be extracted which is done using the $extractVar
function. The required code is following
> mapdata = manuka$extractVar(c("chrom","pos","group"))
> ## output of mapdata
> str(mapdata)
List of 3
$ chrom: chr [1:680] "11" "11" "11" "11" ...
$ pos : num [1:680] 331784 354083 382652 391004 393972 ...
$ group:List of 3
..$ BI: int [1:116] 2 34 36 60 67 78 79 80 103 110 ...
..$ PI: int [1:270] 10 12 15 20 21 24 25 27 30 32 ...
..$ MI: int [1:294] 1 3 4 5 6 7 8 9 11 13 ...
This outputs a list where the first element gives the chromosome number (or whatever was in the CHROM field in the VCF file) for each SNP, the second element gives the positions (or whatever was in the POS field in the VCF file), and the third element is a list (of length 3) as shown with the following code,
> str(mapdata$group)
List of 3
$ BI: int [1:116] 2 34 36 60 67 78 79 80 103 110 ...
$ PI: int [1:270] 10 12 15 20 21 24 25 27 30 32 ...
$ MI: int [1:294] 1 3 4 5 6 7 8 9 11 13 ...
The first element of mapdata$group
gives the indices of the SNPs that are BI, the second element gives the indices of the PI SNPs, and the third element gives the indices of the MI SNPs. The chromosome number of all the PI SNPs can be obtained using
> mapdata$chrom[mapdata$group$PI]
[1] "11" "11" "11" "11" "11" "11" "11" ...
(which in this case is just "11"
since all the SNPs were mapped to chromosome 11) and the positions of the PI SNPs can be obtained using
> mapdata$pos[mapdata$group$PI]
[1] 413453 415939 439362 1006308 1154431 ...
In like manner, you can obtain the chromosome and position information for the MI SNPs by using
> mapdata$chrom[mapdata$group$MI]
> mapdata$pos[mapdata$group$MI]
and similarly you can do the same for the BI SNPs.
Hope that makes sense. Let me know if anything is not clear.
Thanks, Timothy
I have data printed from the object I produced using makeFS function. Progeny: 310 MI SNPs: 1269 PI SNPs: 551 BI SNPs: 97 Total SNPs: 1917 I would like to extract the names of the MI SNPs and PI SNPs. How would I do so?