Closed willright28 closed 5 years ago
Is there a column in your VCF called LS004?
On Wed, Aug 7, 2019 at 11:47 PM willright28 notifications@github.com wrote:
hi dear @terhorst https://github.com/terhorst: when i use the vcf2smc with 1 sample for 1 population, with the commond below: smc++ vcf2smc LS004_bcf.vcf.gz chr1.smc.gz NC_011462.1 LS:LS004 and it returns some error information:
/gpfs/home/chenyl/smcpp/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters 18138 smcpp.commands.vcf2smc WARNING Neither missing cutoff (-c) or mask (-m) has been specified. This means that stretches of the chromosome that do not have any VCF entries (for example, centromeres) will be interpreted as homozygous recessive. 18138 smcpp.commands.vcf2smc INFO Population 1: 18138 smcpp.commands.vcf2smc INFO Distinguished lineages: LS004:0, LS004:1 18138 smcpp.commands.vcf2smc INFO Undistinguished lineages: Traceback (most recent call last): File "/gpfs/home/chenyl/smcpp/bin/smc++", line 11, in load_entry_point('smcpp==1.15.2', 'console_scripts', 'smc++')() File "/gpfs/home/chenyl/smcpp/lib/python3.6/site-packages/smcpp/frontend/console.py", line 26, in main cmds[args.command].main(args) File "/gpfs/home/chenyl/smcpp/lib/python3.6/site-packages/smcpp/commands/vcf2smc.py", line 134, in main raise RuntimeError("Distinguished lineages not found in data?") RuntimeError: Distinguished lineages not found in data?
real 0m20.083s user 0m1.523s sys 0m0.471s Host key verification failed.
and unfortunately i can not get what they mean, so if it possible for you to nicely help me with this, thanks in advance!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/popgenmethods/smcpp/issues/120?email_source=notifications&email_token=AAAEOHAEGCYUDVLLTL7YUHDQDOJOLA5CNFSM4IKGCWV2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HEBJUKA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEOHHZ22N2GNL4BO33WCTQDOJOLANCNFSM4IKGCWVQ .
-- Jonathan terhorst@gmail.com
Is there a column in your VCF called LS004? … On Wed, Aug 7, 2019 at 11:47 PM willright28 @.**> wrote: hi dear @terhorst <https://github.com/terhorst>: when i use the vcf2smc with 1 sample for 1 population, with the commond below: smc++ vcf2smc LS004_bcf.vcf.gz chr1.smc.gz NC_011462.1 LS:LS004 and it returns some error information: /gpfs/home/chenyl/smcpp/lib/python3.6/site-packages/h5py/init*.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters 18138 smcpp.commands.vcf2smc WARNING Neither missing cutoff (-c) or mask (-m) has been specified. This means that stretches of the chromosome that do not have any VCF entries (for example, centromeres) will be interpreted as homozygous recessive. 18138 smcpp.commands.vcf2smc INFO Population 1: 18138 smcpp.commands.vcf2smc INFO Distinguished lineages: LS004:0, LS004:1 18138 smcpp.commands.vcf2smc INFO Undistinguished lineages: Traceback (most recent call last): File "/gpfs/home/chenyl/smcpp/bin/smc++", line 11, in load_entry_point('smcpp==1.15.2', 'console_scripts', 'smc++')() File "/gpfs/home/chenyl/smcpp/lib/python3.6/site-packages/smcpp/frontend/console.py", line 26, in main cmds[args.command].main(args) File "/gpfs/home/chenyl/smcpp/lib/python3.6/site-packages/smcpp/commands/vcf2smc.py", line 134, in main raise RuntimeError("Distinguished lineages not found in data?") RuntimeError: Distinguished lineages not found in data? real 0m20.083s user 0m1.523s sys 0m0.471s Host key verification failed. and unfortunately i can not get what they mean, so if it possible for you to nicely help me with this, thanks in advance! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#120?email_source=notifications&email_token=AAAEOHAEGCYUDVLLTL7YUHDQDOJOLA5CNFSM4IKGCWV2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HEBJUKA>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEOHHZ22N2GNL4BO33WCTQDOJOLANCNFSM4IKGCWVQ . -- Jonathan terhorst@gmail.com no...LS004 is one of my sample's name.
There should be a column for each sample. See for example: https://www.internationalgenome.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-40/
On Thu, Aug 8, 2019 at 2:27 AM willright28 notifications@github.com wrote:
Is there a column in your VCF called LS004? … <#m708226242013972477> On Wed, Aug 7, 2019 at 11:47 PM willright28 @.**> wrote: hi dear @terhorst https://github.com/terhorst https://github.com/terhorst: when i use the vcf2smc with 1 sample for 1 population, with the commond below: smc++ vcf2smc LS004_bcf.vcf.gz chr1.smc.gz NC_011462.1 LS:LS004 and it returns some error information: /gpfs/home/chenyl/smcpp/lib/python3.6/site-packages/h5py/init*.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters 18138 smcpp.commands.vcf2smc WARNING Neither missing cutoff (-c) or mask (-m) has been specified. This means that stretches of the chromosome that do not have any VCF entries (for example, centromeres) will be interpreted as homozygous recessive. 18138 smcpp.commands.vcf2smc INFO Population 1: 18138 smcpp.commands.vcf2smc INFO Distinguished lineages: LS004:0, LS004:1 18138 smcpp.commands.vcf2smc INFO Undistinguished lineages: Traceback (most recent call last): File "/gpfs/home/chenyl/smcpp/bin/smc++", line 11, in load_entry_point('smcpp==1.15.2', 'console_scripts', 'smc++')() File "/gpfs/home/chenyl/smcpp/lib/python3.6/site-packages/smcpp/frontend/console.py", line 26, in main cmds[args.command].main(args) File "/gpfs/home/chenyl/smcpp/lib/python3.6/site-packages/smcpp/commands/vcf2smc.py", line 134, in main raise RuntimeError("Distinguished lineages not found in data?") RuntimeError: Distinguished lineages not found in data? real 0m20.083s user 0m1.523s sys 0m0.471s Host key verification failed. and unfortunately i can not get what they mean, so if it possible for you to nicely help me with this, thanks in advance! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#120 https://github.com/popgenmethods/smcpp/issues/120?email_source=notifications&email_token=AAAEOHAEGCYUDVLLTL7YUHDQDOJOLA5CNFSM4IKGCWV2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HEBJUKA>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEOHHZ22N2GNL4BO33WCTQDOJOLANCNFSM4IKGCWVQ . -- Jonathan terhorst@gmail.com no...LS004 is one of my sample's name.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/popgenmethods/smcpp/issues/120?email_source=notifications&email_token=AAAEOHE3VYNRSE3CXV6YJZTQDO4GNA5CNFSM4IKGCWV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD32S5OA#issuecomment-519384760, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEOHA4LABRL3UCCEXEAPLQDO4GNANCNFSM4IKGCWVQ .
-- Jonathan terhorst@gmail.com
Oh in my vcf,the column is called “sample2”,is it matter to modify that to my sample's name(LS004)?
---Original--- From: "Jonathan Terhorst"notifications@github.com Date: 2019/8/8 20:59:40 To: "popgenmethods/smcpp"smcpp@noreply.github.com; Cc: "willright28"854751714@qq.com;"Author"author@noreply.github.com; Subject: Re: [popgenmethods/smcpp] questions when using vcf2smc (#120)
There should be a column for each sample. See for example: https://www.internationalgenome.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-40/
On Thu, Aug 8, 2019 at 2:27 AM willright28 notifications@github.com wrote:
Is there a column in your VCF called LS004? … <#m708226242013972477> On Wed, Aug 7, 2019 at 11:47 PM willright28 @.**> wrote: hi dear @terhorst https://github.com/terhorst https://github.com/terhorst: when i use the vcf2smc with 1 sample for 1 population, with the commond below: smc++ vcf2smc LS004_bcf.vcf.gz chr1.smc.gz NC_011462.1 LS:LS004 and it returns some error information: /gpfs/home/chenyl/smcpp/lib/python3.6/site-packages/h5py/init*.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters 18138 smcpp.commands.vcf2smc WARNING Neither missing cutoff (-c) or mask (-m) has been specified. This means that stretches of the chromosome that do not have any VCF entries (for example, centromeres) will be interpreted as homozygous recessive. 18138 smcpp.commands.vcf2smc INFO Population 1: 18138 smcpp.commands.vcf2smc INFO Distinguished lineages: LS004:0, LS004:1 18138 smcpp.commands.vcf2smc INFO Undistinguished lineages: Traceback (most recent call last): File "/gpfs/home/chenyl/smcpp/bin/smc++", line 11, in load_entry_point('smcpp==1.15.2', 'console_scripts', 'smc++')() File "/gpfs/home/chenyl/smcpp/lib/python3.6/site-packages/smcpp/frontend/console.py", line 26, in main cmds[args.command].main(args) File "/gpfs/home/chenyl/smcpp/lib/python3.6/site-packages/smcpp/commands/vcf2smc.py", line 134, in main raise RuntimeError("Distinguished lineages not found in data?") RuntimeError: Distinguished lineages not found in data? real 0m20.083s user 0m1.523s sys 0m0.471s Host key verification failed. and unfortunately i can not get what they mean, so if it possible for you to nicely help me with this, thanks in advance! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#120 https://github.com/popgenmethods/smcpp/issues/120?email_source=notifications&email_token=AAAEOHAEGCYUDVLLTL7YUHDQDOJOLA5CNFSM4IKGCWV2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HEBJUKA>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEOHHZ22N2GNL4BO33WCTQDOJOLANCNFSM4IKGCWVQ . -- Jonathan terhorst@gmail.com no...LS004 is one of my sample's name.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/popgenmethods/smcpp/issues/120?email_source=notifications&email_token=AAAEOHE3VYNRSE3CXV6YJZTQDO4GNA5CNFSM4IKGCWV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD32S5OA#issuecomment-519384760, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAEOHA4LABRL3UCCEXEAPLQDO4GNANCNFSM4IKGCWVQ .
-- Jonathan terhorst@gmail.com
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
I changed my commond to :
smc++ vcf2smc LS004_bcf.vcf.gz chr1.smc.gz NC_011462.1 LS:sample2
and got the information below :
/gpfs/home/chenyl/smcpp/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from
float
tonp.floating
is deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type
. from ._conv import register_converters as _register_converters 14249 smcpp.commands.vcf2smc WARNING Neither missing cutoff (-c) or mask (-m) has been specified. This means that stretches of the chromosome that do not have any VCF entries (for example, centromeres) will be interpreted as homozygous recessive. 14249 smcpp.commands.vcf2smc INFO Population 1: 14249 smcpp.commands.vcf2smc INFO Distinguished lineages: sample2:0, sample2:1 14249 smcpp.commands.vcf2smc INFO Undistinguished lineages:
0%| | 0.00/119M [00:00<?, ?bases/s] 2%|▏ | 2.65M/119M [00:00<00:04, 26.5Mbases/s] 4%|▍ | 4.69M/119M [00:00<00:04, 24.3Mbases/s] 6%|▌ | 6.62M/119M [00:00<00:04, 22.5Mbases/s] 7%|▋ | 8.01M/119M [00:00<00:05, 18.7Mbases/s] 10%|▉ | 11.4M/119M [00:00<00:04, 21.6Mbases/s] 11%|█▏ | 13.4M/119M [00:00<00:04, 21.3Mbases/s] 13%|█▎ | 15.8M/119M [00:00<00:04, 21.9Mbases/s] 15%|█▌ | 18.1M/119M [00:00<00:04, 22.3Mbases/s] 17%|█▋ | 20.7M/119M [00:00<00:04, 23.3Mbases/s] 21%|██ | 24.5M/119M [00:01<00:03, 26.3Mbases/s] 23%|██▎ | 27.3M/119M [00:01<00:03, 24.8Mbases/s] 25%|██▌ | 29.8M/119M [00:01<00:04, 22.1Mbases/s] 27%|██▋ | 32.2M/119M [00:01<00:04, 20.6Mbases/s] 29%|██▉ | 34.4M/119M [00:01<00:03, 21.1Mbases/s] 31%|███ | 36.8M/119M [00:01<00:03, 21.8Mbases/s] 33%|███▎ | 39.3M/119M [00:01<00:03, 22.6Mbases/s] 35%|███▌ | 41.6M/119M [00:01<00:03, 20.0Mbases/s] 38%|███▊ | 45.5M/119M [00:01<00:03, 23.4Mbases/s] 41%|████ | 48.1M/119M [00:02<00:03, 22.4Mbases/s] 43%|████▎ | 50.6M/119M [00:02<00:03, 21.6Mbases/s] 45%|████▍ | 52.9M/119M [00:02<00:03, 21.4Mbases/s] 47%|████▋ | 55.4M/119M [00:02<00:02, 22.4Mbases/s] 49%|████▊ | 57.7M/119M [00:02<00:02, 21.7Mbases/s] 51%|█████ | 60.0M/119M [00:02<00:02, 19.9Mbases/s] 53%|█████▎ | 62.4M/119M [00:02<00:02, 20.9Mbases/s] 55%|█████▌ | 65.6M/119M [00:02<00:02, 23.3Mbases/s] 57%|█████▋ | 68.0M/119M [00:02<00:02, 22.7Mbases/s] 59%|█████▉ | 70.4M/119M [00:03<00:02, 22.1Mbases/s] 61%|██████▏ | 72.7M/119M [00:03<00:02, 22.3Mbases/s] 65%|██████▍ | 76.7M/119M [00:03<00:01, 25.7Mbases/s] 67%|██████▋ | 79.6M/119M [00:03<00:01, 26.7Mbases/s] 70%|██████▉ | 82.4M/119M [00:03<00:01, 25.8Mbases/s] 72%|███████▏ | 85.1M/119M [00:03<00:01, 26.0Mbases/s] 74%|███████▍ | 87.8M/119M [00:03<00:01, 24.6Mbases/s] 76%|███████▌ | 90.4M/119M [00:03<00:01, 21.6Mbases/s] 78%|███████▊ | 92.7M/119M [00:04<00:01, 21.0Mbases/s] 80%|████████ | 94.8M/119M [00:04<00:01, 16.3Mbases/s] 82%|████████▏ | 96.7M/119M [00:04<00:01, 15.6Mbases/s] 84%|████████▍ | 99.6M/119M [00:04<00:01, 18.1Mbases/s] 86%|████████▌ | 102M/119M [00:04<00:00, 19.5Mbases/s] 88%|████████▊ | 104M/119M [00:04<00:00, 17.0Mbases/s] 90%|████████▉ | 106M/119M [00:04<00:00, 18.5Mbases/s] 92%|█████████▏| 109M/119M [00:04<00:00, 17.9Mbases/s] 93%|█████████▎| 111M/119M [00:05<00:00, 18.6Mbases/s] 97%|█████████▋| 116M/119M [00:05<00:00, 22.9Mbases/s] 100%|█████████▉| 119M/119M [00:05<00:00, 22.6Mbases/s] 19716 smcpp.util INFO Wrote 10617 observations
It looks successful, doesn't it?
Yes.
hii dear @terhorst @willright28 I'm facing same issue "RuntimeError("Distinguished lineages not found in data?") RuntimeError: Distinguished lineages not found in data?" using example data mentioned in this github repository.https://github.com/popgenmethods/smcpp/blob/master/example/example.vcf.gz
smc++ vcf2smc example.vcf.gz chr1.smc.gz chr1 CEU:NA12878,NA12879 smc++ vcf2smc -d NA12878 NA12879 example.vcf.gz chr1.smc.gz chr1 CEU:NA12878,NA12879 for i in {7..9}; do smc++ vcf2smc -d NA1287$i NA1287$i example.vcf.gz out.$i.txt chr1 NA12877 NA12878 NA12890; done smc++ estimate -o output/ 0.1 out1.txt
kindly help me to solve please check the header for this file and sample and population info. and suggest me changes to be do accordingly
###########
mylinux@ChiragsPC:~/smcppdata$ smc++ vcf2smc example.vcf.gz chr1.smc.gz chr1 CEU:NA12878,NA12879
2016 smcpp.commands.vcf2smc WARNING Neither missing cutoff (-c) or mask (-m) has been specified. This means that stretches of the chromosome that do not have any VCF entries (for example, centromeres) will be interpreted as homozygous recessive.
2020 smcpp.commands.vcf2smc INFO Population 1:
2020 smcpp.commands.vcf2smc INFO Distinguished lineages: NA12878:0, NA12878:1
2021 smcpp.commands.vcf2smc INFO Undistinguished lineages: NA12879:0, NA12879:1
[E::idx_find_and_load] Could not retrieve index file for 'example.vcf.gz'
Traceback (most recent call last):
File "/home/mylinux/.local/bin/smc++", line 8, in
mylinux@ChiragsPC:~/smcppdata$ smc++ vcf2smc -d NA12878 NA12879 example.vcf.gz chr1.smc.gz chr1 CEU:NA12878,NA12879
2028 smcpp.commands.vcf2smc WARNING Neither missing cutoff (-c) or mask (-m) has been specified. This means that stretches of the chromosome that do not have any VCF entries (for example, centromeres) will be interpreted as homozygous recessive.
2029 smcpp.commands.vcf2smc INFO Population 1:
2029 smcpp.commands.vcf2smc INFO Distinguished lineages: NA12878:0, NA12879:1
2029 smcpp.commands.vcf2smc INFO Undistinguished lineages: NA12878:1, NA12879:0
[E::idx_find_and_load] Could not retrieve index file for 'example.vcf.gz'
Traceback (most recent call last):
File "/home/mylinux/.local/bin/smc++", line 8, in
mylinux@ChiragsPC:~/smcppdata$ for i in {7..9};
do smc++ vcf2smc -d NA1287$i NA1287$i example.vcf.gz out.$i.txt chr1 NA12877 NA12878 NA12890;
done usage: smc++ vcf2smc [-h] [-v] [--cores CORES] [-d sample_id sample_id] [--length LENGTH] [--ignore-missing] [--missing-cutoff c] [--mask MASK] [--drop-first-last] vcf.gz out[.gz] contig pop1 [pop2] smc++ vcf2smc: error: argument pop1: 'NA12877' should be a comma-separated list of sample ids preceded by a population identifier. See 'smc++ vcf2smc -h'. usage: smc++ vcf2smc [-h] [-v] [--cores CORES] [-d sample_id sample_id] [--length LENGTH] [--ignore-missing] [--missing-cutoff c] [--mask MASK] [--drop-first-last] vcf.gz out[.gz] contig pop1 [pop2] smc++ vcf2smc: error: argument pop1: 'NA12877' should be a comma-separated list of sample ids preceded by a population identifier. See 'smc++ vcf2smc -h'. usage: smc++ vcf2smc [-h] [-v] [--cores CORES] [-d sample_id sample_id] [--length LENGTH] [--ignore-missing] [--missing-cutoff c] [--mask MASK] [--drop-first-last] vcf.gz out[.gz] contig pop1 [pop2] smc++ vcf2smc: error: argument pop1: 'NA12877' should be a comma-separated list of sample ids preceded by a population identifier. See 'smc++ vcf2smc -h'.
smc++ vcf2smc example.vcf.gz chr1.smc.gz chr1 CEU:NA1885,NA3861
827 smcpp.commands.vcf2smc WARNING Neither missing cutoff (-c) or mask (-m) has been specified. This means that stretches of the chromosome that do not have a
ny VCF entries (for example, centromeres) will be interpreted as homozygous recessive.
827 smcpp.commands.vcf2smc INFO Population 1:
827 smcpp.commands.vcf2smc INFO Distinguished lineages: NA1885:0, NA1885:1
827 smcpp.commands.vcf2smc INFO Undistinguished lineages: NA3861:0, NA3861:1
Traceback (most recent call last):
File "/home/exouser/.local/bin/smc++", line 8, in b'example.vcf.gz'
(mode=b'r'
) - is it VCF/BCF format?
@willright28 kindly send me your header info from vcf.gz file. If, possible then example data set from your original data, so that i can do necessary changes accordingly
@terhorst @willright28 i'm using ubuntu linux application on windows10
Regards Thankyou
@terhorst please upload the correct commands usings your example data set in this repository big request
@terhorst @willright28 kindly reply
hi dear @terhorst: when i use the vcf2smc with 1 sample for 1 population, with the commond below:
smc++ vcf2smc LS004_bcf.vcf.gz chr1.smc.gz NC_011462.1 LS:LS004
and it returns some error information:real 0m20.083s user 0m1.523s sys 0m0.471s Host key verification failed.
and unfortunately i can not get what they mean, so if it possible for you to nicely help me with this, thanks in advance!