Closed biozzq closed 7 years ago
SFS entries are all 0 because of the value you are setting for projecting. If you are allowing 10% missing data in the vcf, and then projecting to the full sample size it makes sense why all snps are being removed. The optimal workflow is to first run:
./easySFS -i intergenic.vcf -p pops_file.txt --preview
to establish the "best" values for down projecting each population.
Dear @isaacovercast
I have run the preview
model. In addition, my vcf have been processed by beagle
which will impute the missing genotypes. so I want to project to the full sample size for all populations. but I am confused about the results of SFS entries.
pop1
(2, 5.0) (3, 8.0) (4, 10.0) (5, 11.0) (6, 12.0) (7, 13.0) (8, 14.0) (9, 15.0) (10, 15.0) (11, 16.0) (12, 17.0) (13, 17.0) (14, 18.0) (15, 18.0) (16, 18.0) (17, 19.0) (18, 19.0) (19, 20.0) (20, 20.0) (21, 20.0) (22, 21.0) (23, 21.0) (24, 21.0) (25, 21.0) (26, 22.0) (27, 22.0) (28, 22.0) (29, 22.0) (30, 23.0) (31, 23.0) (32, 23.0) (33, 23.0) (34, 23.0) (35, 24.0) (36, 24.0) (37, 24.0) (38, 24.0) (39, 24.0) (40, 25.0) (41, 25.0) (42, 25.0) (43, 25.0) (44, 25.0) (45, 25.0) (46, 25.0) (47, 25.0) (48, 26.0) (49, 26.0) (50, 26.0) (51, 26.0) (52, 26.0) (53, 26.0) (54, 26.0) (55, 26.0) (56, 26.0) (57, 26.0) (58, 27.0) (59, 27.0) (60, 27.0) (61, 27.0) (62, 27.0) (63, 27.0) (64, 27.0) (65, 27.0) (66, 27.0) (67, 25.0) (68, 25.0) (69, 25.0) (70, 25.0) (71, 20.0) (72, 21.0) (73, 20.0) (74, 20.0) (75, 20.0) (76, 20.0) (77, 18.0) (78, 18.0) (79, 16.0) (80, 16.0) (81, 12.0) (82, 12.0) (83, 4.0) (84, 4.0) (85, 2.0)
pop2 (2, 6.0) (3, 9.0) (4, 11.0) (5, 12.0) (6, 13.0) (7, 14.0) (8, 15.0) (9, 15.0) (10, 16.0) (11, 16.0) (12, 16.0) (13, 17.0) (14, 17.0) (15, 17.0) (16, 17.0) (17, 18.0) (18, 18.0) (19, 18.0) (20, 18.0) (21, 18.0) (22, 18.0) (23, 19.0) (24, 19.0) (25, 19.0) (26, 19.0) (27, 19.0) (28, 19.0) (29, 19.0) (30, 19.0) (31, 19.0) (32, 19.0) (33, 19.0) (34, 20.0) (35, 20.0) (36, 20.0) (37, 19.0) (38, 19.0) (39, 14.0) (40, 14.0) (41, 11.0) (42, 11.0) (43, 7.0) (44, 7.0) (45, 5.0) (46, 5.0) (47, 2.0)
pop3 (2, 6.0) (3, 9.0) (4, 10.0) (5, 12.0) (6, 13.0) (7, 13.0) (8, 14.0) (9, 14.0) (10, 15.0) (11, 15.0) (12, 15.0) (13, 16.0) (14, 16.0) (15, 16.0) (16, 16.0) (17, 17.0) (18, 17.0) (19, 17.0) (20, 17.0) (21, 17.0) (22, 17.0) (23, 17.0) (24, 17.0) (25, 17.0) (26, 17.0) (27, 17.0) (28, 18.0) (29, 18.0) (30, 18.0) (31, 18.0) (32, 18.0) (33, 18.0) (34, 18.0) (35, 18.0) (36, 18.0) (37, 18.0) (38, 18.0) (39, 18.0) (40, 18.0) (41, 18.0) (42, 18.0) (43, 18.0) (44, 18.0) (45, 18.0) (46, 18.0) (47, 18.0) (48, 18.0) (49, 18.0) (50, 18.0) (51, 18.0) (52, 18.0) (53, 18.0) (54, 19.0) (55, 19.0) (56, 19.0) (57, 19.0) (58, 19.0) (59, 19.0) (60, 19.0) (61, 19.0) (62, 19.0) (63, 19.0) (64, 19.0) (65, 19.0) (66, 19.0) (67, 19.0) (68, 19.0) (69, 19.0) (70, 19.0) (71, 19.0) (72, 19.0) (73, 19.0) (74, 19.0) (75, 19.0) (76, 19.0) (77, 19.0) (78, 19.0) (79, 19.0) (80, 19.0) (81, 19.0) (82, 19.0) (83, 19.0) (84, 19.0) (85, 19.0) (86, 19.0) (87, 19.0) (88, 19.0) (89, 19.0) (90, 19.0) (91, 19.0) (92, 19.0) (93, 19.0) (94, 19.0) (95, 19.0) (96, 19.0) (97, 19.0) (98, 19.0) (99, 19.0) (100, 19.0) (101, 19.0) (102, 19.0) (103, 19.0) (104, 19.0) (105, 19.0) (106, 19.0) (107, 19.0) (108, 19.0) (109, 19.0) (110, 19.0) (111, 19.0) (112, 19.0) (113, 19.0) (114, 20.0) (115, 20.0) (116, 20.0) (117, 20.0) (118, 20.0) (119, 20.0) (120, 20.0) (121, 20.0) (122, 20.0) (123, 20.0) (124, 20.0) (125, 20.0) (126, 20.0) (127, 20.0) (128, 20.0) (129, 20.0) (130, 20.0) (131, 20.0) (132, 20.0) (133, 20.0) (134, 20.0) (135, 20.0) (136, 20.0) (137, 20.0) (138, 20.0) (139, 20.0) (140, 20.0) (141, 20.0) (142, 20.0) (143, 20.0) (144, 20.0) (145, 20.0) (146, 20.0) (147, 20.0) (148, 20.0) (149, 20.0) (150, 20.0) (151, 20.0) (152, 20.0) (153, 20.0) (154, 20.0) (155, 20.0) (156, 20.0) (157, 20.0) (158, 20.0) (159, 20.0) (160, 20.0) (161, 20.0) (162, 20.0) (163, 20.0) (164, 20.0) (165, 20.0) (166, 20.0) (167, 20.0) (168, 20.0) (169, 19.0) (170, 19.0) (171, 19.0) (172, 19.0) (173, 19.0) (174, 19.0) (175, 19.0) (176, 19.0) (177, 19.0) (178, 19.0) (179, 18.0) (180, 18.0) (181, 18.0) (182, 18.0) (183, 16.0) (184, 16.0) (185, 16.0) (186, 16.0) (187, 15.0) (188, 15.0) (189, 15.0) (190, 15.0) (191, 15.0) (192, 15.0) (193, 13.0) (194, 13.0) (195, 12.0) (196, 12.0) (197, 10.0) (198, 10.0) (199, 10.0) (200, 10.0) (201, 7.0) (202, 7.0) (203, 7.0) (204, 7.0) (205, 7.0) (206, 7.0) (207, 6.0) (208, 6.0) (209, 5.0) (210, 5.0) (211, 5.0) (212, 5.0) (213, 5.0) (214, 5.0) (215, 4.0) (216, 4.0) (217, 4.0) (218, 4.0) (219, 3.0) (220, 3.0) (221, 3.0) (222, 3.0) (223, 3.0) (224, 3.0) (225, 3.0)
I see, that makes sense. Can you send me a copy of your vcf so I can try to debug it? iovercast@gc.cuny.edu
Dear @isaacovercast,
I have sent you some test data. hope it helps.
May I ask what the final solution was? I got the same problem.
Here was the solution for this problem:
Thanks for sending the data. I have definitely figured out what's going on. I developed easySFS with RAD-like data in mind. The default will randomly select 1 snp per locus, so by default it was selecting 1 snp per chromosome on your data, which explains the bad output. If you run it like this it will use ALL the snps in your vcf file and it'll work much better:
./easySFS.py -i buzz/debug.vcf -p buzz/debug.pop -a --preview
Thanks for your advice.
Dear @isaacovercast , Recently I want to prepare input file for running Fastsimcoal2, but I find the SFS entries are all 0, my vcf file likes following (processed by following strategies) 1, filter the sites with maf <= 0.1 2, filter the sites with missing rate >= 0.1 3, filter the sites with abnormal depth > average+3sd or <average-3sd 4, imputation by beagle 4.1 5, filter the SNPs located in repeat region (repeat mask results) 6, filter the SNPs located in genes (exon+intro)
the VCF contains 3 populations, 43, 24 and 112 diploids respectively;
my command like following
python easySFS.py -i intergentic.vcf -p all.sep --proj=86,48,224 -o MSF
so I do not know why the SFS entries are all 0 ? How the minor allele identified in the vcf, did it identify after computing the global frequency of the two alleles over all samples ? Many thanks!
Best Wishes