tkrahn / extract23

Extract a simulated 23andMe (V3) style file from a Whole Genome BAM file
GNU General Public License v3.0
28 stars 5 forks source link

FGC WGS File Issues #1

Closed Griz054 closed 7 years ago

Griz054 commented 7 years ago

I've tried to run this both using an hg19 ref file but the 23andme zip file turns out to 847 bytes. II don't get any errors, it jus doesn't produce the correct file. f I convert it to CHR37 using the commands you recommended, it fails. Here are the error reports I get.

griz@griz-laptop-linux:~/extract23-master$ gunzip -c 23andMe_V3_hg19_ref.tab.gz > 23andMe_V3_hg19_ref.tab
griz@griz-laptop-linux:~/extract23-master$ cat 23andMe_V3_hg19_ref.tab | sed 's/^chr//' > 23andMe_V3_GRCh37_ref.tab
griz@griz-laptop-linux:~/extract23-master$ bgzip -c 23andMe_V3_GRCh37_ref.tab > 23andMe_V3_GRCh37_ref.tab.gz
griz@griz-laptop-linux:~/extract23-master$ tabix -s1 -b2 -e2 23andMe_V3_GRCh37_ref.tab.gz
griz@griz-laptop-linux:~/extract23-master$ ./extract23.sh -b 2BWV9_JAMESLADAMSSR_FULLGENOME.bam -r ucsc.hg19.fasta -t 23andMe_V3_GRCh37_ref.tab.gz -o 23andMe_V3_GRCH37.txt -v
Starting mpileup... Please be patient!
samtools mpileup: Could not read file "-f": No such file or directory
Not a BGZF file: 23andMe_raw.vcf.gz
tbx_index_build failed: 23andMe_raw.vcf.gz
Mpileup completed. Starting SNP calling...
Note: Neither --ploidy nor --ploidy-file given, assuming all sites are diploid
Failed to open 23andMe_raw.vcf.gz: unknown file type
Not a BGZF file: 23andMe_called.vcf.gz
tbx_index_build failed: 23andMe_called.vcf.gz
SNP calling completed. Starting annotation...
[E::hts_open_format] fail to open file '-c'
./extract23.sh: line 77: 13202 Segmentation fault      (core dumped) bcftools annotate -O z -a ${REF_23ANDME} -c CHROM,POS,ID 23andMe_called.vcf.gz > 23andMe_annotated.vcf.gz
Not a BGZF file: 23andMe_annotated.vcf.gz
tbx_index_build failed: 23andMe_annotated.vcf.gz
Annotation completed. Starting extraction from VCF ...
Failed to open 23andMe_annotated.vcf.gz: unknown file type
Extraction from VCF completed. Sorting by chromosome and position ...
23andMe_V3_hg19.txt was created. Compressing ...
  adding: 23andMe_V3_hg19.txt (deflated 46%)
extract23: Output file 23andMe_V3_hg19.txt.zip was created.
griz@griz-laptop-linux:~/extract23-master$ 

Do I neeed a CHR37 fasta file if I go that direction or am I missing the boat all together?

Griz054 commented 7 years ago

More info. my bam file is a whole genome 30X by FGC.

tkrahn commented 7 years ago

If the 23andMe zip file is 847 bytes it contains most likely just the header. You can verify this by unzipping the zip file and opening it in a text editor.

You actually get an error:

samtools mpileup: Could not read file "-f": No such file or directory

This means that mpileup doesn't find your reference sequence ucsc.hg19.fasta Is it located in your working directory? Try to give the full path to your reference sequence. Also make sure that your ucsc.hg19.fasta is indexed with samtools index. You should have a file named ucsc.hg19.fasta.idx in the same directory as your ucsc.hg19.fasta file

At a later point in the script you have a segmentation fault for bcftools annotate. Please verify if bcftools is correctly installed.

I hope this helps,

Thomas

tkrahn commented 7 years ago

If you use the original WGS BAM file supplied by FGC, then you need the GRCh37 human reference genome. This is the same that you use for viewing your BAM file with samtools tview or with IGV, In other words the FASTA reference file begins with

>1

as oposed to the hg19 version that starts with

>chr1
Griz054 commented 7 years ago

I switched over to a GRCh37 fasta file. It does begin with >1. I used samtools to do a dict with it and then indexed it to generate the fai file. I'm using full paths with everything. Here's what I did.

griz@griz-laptop-linux:~/extract23-master$ samtools dict -a GRCh37 -s "Homo sapiens" /home/griz/extract23-master/human_g1k_v37.fasta
@HD VN:1.0  SO:unsorted
@SQ SN:1    LN:249250621    M5:1b22b98cdeb4a9304cb5d48026a85128 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:2    LN:243199373    M5:a0d9851da00400dec1098a9255ac712e UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:3    LN:198022430    M5:fdfd811849cc2fadebc929bb925902e5 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:4    LN:191154276    M5:23dccd106897542ad87d2765d28a19a1 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:5    LN:180915260    M5:0740173db9ffd264d728f32784845cd7 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:6    LN:171115067    M5:1d3a93a248d92a729ee764823acbbc6b UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:7    LN:159138663    M5:618366e953d6aaad97dbe4777c29375e UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:8    LN:146364022    M5:96f514a9929e410c6651697bded59aec UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:9    LN:141213431    M5:3e273117f15e0a400f01055d9f393768 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:10   LN:135534747    M5:988c28e000e84c26d552359af1ea2e1d UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:11   LN:135006516    M5:98c59049a2df285c76ffb1c6db8f8b96 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:12   LN:133851895    M5:51851ac0e1a115847ad36449b0015864 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:13   LN:115169878    M5:283f8d7892baa81b510a015719ca7b0b UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:14   LN:107349540    M5:98f3cae32b2a2e9524bc19813927542e UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:15   LN:102531392    M5:e5645a794a8238215b2cd77acb95a078 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:16   LN:90354753 M5:fc9b1a7b42b97a864f56b348b06095e6 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:17   LN:81195210 M5:351f64d4f4f9ddd45b35336ad97aa6de UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:18   LN:78077248 M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:19   LN:59128983 M5:1aacd71f30db8e561810913e0b72636d UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:20   LN:63025520 M5:0dec9660ec1efaaf33281c0d5ea2560f UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:21   LN:48129895 M5:2979a6085bfe28e3ad6f552f361ed74d UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:22   LN:51304566 M5:a718acaa6135fdca8357d5bfe94211dd UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:X    LN:155270560    M5:7e0e2e580297b7764e31dbc80c2540dd UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:Y    LN:59373566 M5:1fa3474750af0948bdf97d5a0ee52e51 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:MT   LN:16569    M5:c68f52674c9fb33aef52dcf399755519 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000207.1   LN:4262 M5:f3814841f1939d3ca19072d9e89f3fd7 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000226.1   LN:15008    M5:1c1b2cd1fccbc0a99b6a447fa24d1504 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000229.1   LN:19913    M5:d0f40ec87de311d8e715b52e4c7062e1 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000231.1   LN:27386    M5:ba8882ce3a1efa2080e5d29b956568a4 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000210.1   LN:27682    M5:851106a74238044126131ce2a8e5847c UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000239.1   LN:33824    M5:99795f15702caec4fa1c4e15f8a29c07 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000235.1   LN:34474    M5:118a25ca210cfbcdfb6c2ebb249f9680 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000201.1   LN:36148    M5:dfb7e7ec60ffdcb85cb359ea28454ee9 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000247.1   LN:36422    M5:7de00226bb7df1c57276ca6baabafd15 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000245.1   LN:36651    M5:89bc61960f37d94abf0df2d481ada0ec UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000197.1   LN:37175    M5:6f5efdd36643a9b8c8ccad6f2f1edc7b UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000203.1   LN:37498    M5:96358c325fe0e70bee73436e8bb14dbd UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000246.1   LN:38154    M5:e4afcd31912af9d9c2546acf1cb23af2 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000249.1   LN:38502    M5:1d78abec37c15fe29a275eb08d5af236 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000196.1   LN:38914    M5:d92206d1bb4c3b4019c43c0875c06dc0 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000248.1   LN:39786    M5:5a8e43bec9be36c7b49c84d585107776 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000244.1   LN:39929    M5:0996b4475f353ca98bacb756ac479140 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000238.1   LN:39939    M5:131b1efc3270cc838686b54e7c34b17b UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000202.1   LN:40103    M5:06cbf126247d89664a4faebad130fe9c UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000234.1   LN:40531    M5:93f998536b61a56fd0ff47322a911d4b UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000232.1   LN:40652    M5:3e06b6741061ad93a8587531307057d8 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000206.1   LN:41001    M5:43f69e423533e948bfae5ce1d45bd3f1 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000240.1   LN:41933    M5:445a86173da9f237d7bcf41c6cb8cc62 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000236.1   LN:41934    M5:fdcd739913efa1fdc64b6c0cd7016779 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000241.1   LN:42152    M5:ef4258cdc5a45c206cea8fc3e1d858cf UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000243.1   LN:43341    M5:cc34279a7e353136741c9fce79bc4396 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000242.1   LN:43523    M5:2f8694fc47576bc81b5fe9e7de0ba49e UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000230.1   LN:43691    M5:b4eb71ee878d3706246b7c1dbef69299 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000237.1   LN:45867    M5:e0c82e7751df73f4f6d0ed30cdc853c0 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000233.1   LN:45941    M5:7fed60298a8d62ff808b74b6ce820001 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000204.1   LN:81310    M5:efc49c871536fa8d79cb0a06fa739722 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000198.1   LN:90085    M5:868e7784040da90d900d2d1b667a1383 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000208.1   LN:92689    M5:aa81be49bf3fe63a79bdc6a6f279abf6 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000191.1   LN:106433   M5:d75b436f50a8214ee9c2a51d30b2c2cc UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000227.1   LN:128374   M5:a4aead23f8053f2655e468bcc6ecdceb UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000228.1   LN:129120   M5:c5a17c97e2c1a0b6a9cc5a6b064b714f UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000214.1   LN:137718   M5:46c2032c37f2ed899eb41c0473319a69 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000221.1   LN:155397   M5:3238fb74ea87ae857f9c7508d315babb UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000209.1   LN:159169   M5:f40598e2a5a6b26e84a3775e0d1e2c81 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000218.1   LN:161147   M5:1d708b54644c26c7e01c2dad5426d38c UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000220.1   LN:161802   M5:fc35de963c57bf7648429e6454f1c9db UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000213.1   LN:164239   M5:9d424fdcc98866650b58f004080a992a UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000211.1   LN:166566   M5:7daaa45c66b288847b9b32b964e623d3 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000199.1   LN:169874   M5:569af3b73522fab4b40995ae4944e78e UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000217.1   LN:172149   M5:6d243e18dea1945fb7f2517615b8f52e UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000216.1   LN:172294   M5:642a232d91c486ac339263820aef7fe0 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000215.1   LN:172545   M5:5eb3b418480ae67a997957c909375a73 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000205.1   LN:174588   M5:d22441398d99caf673e9afb9a1908ec5 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000219.1   LN:179198   M5:f977edd13bac459cb2ed4a5457dba1b3 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000224.1   LN:179693   M5:d5b2fc04f6b41b212a4198a07f450e20 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000223.1   LN:180455   M5:399dfa03bf32022ab52a846f7ca35b30 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000195.1   LN:182896   M5:5d9ec007868d517e73543b005ba48535 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000212.1   LN:186858   M5:563531689f3dbd691331fd6c5730a88b UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000222.1   LN:186861   M5:6fe9abac455169f50470f5a6b01d0f59 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000200.1   LN:187035   M5:75e4c8d17cd4addf3917d1703cacaf25 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000193.1   LN:189789   M5:dbb6e8ece0b5de29da56601613007c2a UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000194.1   LN:191469   M5:6ac8f815bf8e845bb3031b73f812c012 UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000225.1   LN:211173   M5:63945c3e6962f28ffd469719a747e73c UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
@SQ SN:GL000192.1   LN:547496   M5:325ba9e808f669dfeee210fdd7b470ac UR:file:///home/griz/extract23-master/human_g1k_v37.fasta   AS:GRCh37   SP:Homo sapiens
griz@griz-laptop-linux:~/extract23-master$ samtools faidx /home/griz/extract23-master/human_g1k_v37.fasta
griz@griz-laptop-linux:~/extract23-master$ ls
0                                 23andMe_V3_hg19.txt.zip                 human_g1k_v37.fasta
23andMe_V3_GRCh37_ref.tab         2BWV9_JAMESLADAMSSR_FULLGENOME.bam      human_g1k_v37.fasta.fai
23andMe_V3_GRCh37_ref.tab.gz      2BWV9_JAMESLADAMSSR_FULLGENOME.bam.bai  LICENSE
23andMe_V3_GRCh37_ref.tab.gz.tbi  2BWV9.zip                               README.md
23andMe_V3_hg19_ref.tab.gz        extract23.sh                            ucsc.hg19.fasta
23andMe_V3_hg19_ref.tab.gz.tbi    fa_files                                ucsc.hg19.fasta.fai
griz@griz-laptop-linux:~/extract23-master$ ./extract23.sh -b /home/griz/extract23-master/2BWV9_JAMESLADAMSSR_FULLGENOME.bam -r /home/griz/extract23-master/human_g1K_v37.fasta -t /home/griz/extract23-master/23andMe_V3_GRCh37_ref.tab.gz -o /home/griz/extract23-master/23andme_v3_GRCHR37.txt
Starting mpileup... Please be patient!
samtools mpileup: Could not read file "-f": No such file or directory
Not a BGZF file: 23andMe_raw.vcf.gz
tbx_index_build failed: 23andMe_raw.vcf.gz
Mpileup completed. Starting SNP calling...
Note: Neither --ploidy nor --ploidy-file given, assuming all sites are diploid
Failed to open 23andMe_raw.vcf.gz: unknown file type
Not a BGZF file: 23andMe_called.vcf.gz
tbx_index_build failed: 23andMe_called.vcf.gz
SNP calling completed. Starting annotation...
[E::hts_open_format] fail to open file '-c'
./extract23.sh: line 77:  3081 Segmentation fault      (core dumped) bcftools annotate -O z -a ${REF_23ANDME} -c CHROM,POS,ID 23andMe_called.vcf.gz > 23andMe_annotated.vcf.gz
Not a BGZF file: 23andMe_annotated.vcf.gz
tbx_index_build failed: 23andMe_annotated.vcf.gz
Annotation completed. Starting extraction from VCF ...
Failed to open 23andMe_annotated.vcf.gz: unknown file type
Extraction from VCF completed. Sorting by chromosome and position ...
23andMe_V3_hg19.txt was created. Compressing ...
updating: 23andMe_V3_hg19.txt (deflated 46%)
extract23: Output file 23andMe_V3_hg19.txt.zip was created.
griz@griz-laptop-linux:~/extract23-master$ ls
0                                 23andMe_V3_hg19.txt.zip                 human_g1k_v37.fasta
23andMe_V3_GRCh37_ref.tab         2BWV9_JAMESLADAMSSR_FULLGENOME.bam      human_g1k_v37.fasta.fai
23andMe_V3_GRCh37_ref.tab.gz      2BWV9_JAMESLADAMSSR_FULLGENOME.bam.bai  LICENSE
23andMe_V3_GRCh37_ref.tab.gz.tbi  2BWV9.zip                               README.md
23andMe_V3_hg19_ref.tab.gz        extract23.sh                            ucsc.hg19.fasta
23andMe_V3_hg19_ref.tab.gz.tbi    fa_files                                ucsc.hg19.fasta.fai
griz@griz-laptop-linux:~/extract23-master$ 
Griz054 commented 7 years ago

I tried using the gz version of the 1000 genomes Fasta as well....dict went ok but when I tried to index it -

griz@griz-laptop-linux:~/extract23-master$ samtools faidx /home/griz/extract23-master/human_g1k_v37.fasta.gz
Cannot index files compressed with gzip, please use bgzip
Could not build fai index /home/griz/extract23-master/human_g1k_v37.fasta.gz.fai
griz@griz-laptop-linux:~/extract23-master$ 
Griz054 commented 7 years ago

Thanks for your help BTW.

tkrahn commented 7 years ago

It seems like your samtools mpileup version doesn't understand the -f parameter. Which version of samtools are you using? You definiely need a htslib version of samtools from http://www.htslib.org/ The old versions of samtools had a slightly different syntax and are missing some features that are needed for the extract23.sh script. If you just type samtools (without parameters) and enter at the shell prompt, you should see:

thomas@streymoy:/genomes/1/2448$ samtools

Program: samtools (Tools for alignments in the SAM format)
Version: 1.3.1-42-g0a15035 (using htslib 1.3.2-176-gafd9b56)

Usage:   samtools <command> [options]
(...)

"using htslib" is what you need

Griz054 commented 7 years ago

Thanks I'll look when I get home

Jim "Griz" Adams

On Jan 30, 2017 8:15 AM, "tkrahn" notifications@github.com wrote:

It seems like your samtools mpileup version doesn't understand the -f parameter. Which version of samtools are you using? You definiely need a htslib version of samtools from http://www.htslib.org/ The old versions of samtools had a slightly different syntax and are missing some features that are needed for the extract23.sh script. If you just type samtools (without parameters) and enter at the shell prompt, you should see:

thomas@streymoy:/genomes/1/2448$ samtools

Program: samtools (Tools for alignments in the SAM format) Version: 1.3.1-42-g0a15035 (using htslib 1.3.2-176-gafd9b56)

Usage: samtools [options] (...)

"using htslib" is what you need

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tkrahn/extract23/issues/1#issuecomment-276059729, or mute the thread https://github.com/notifications/unsubscribe-auth/AYPDKjc6DYBZj-bizsdgN3zNQuCf1MA1ks5rXeKJgaJpZM4LxAmO .

Griz054 commented 7 years ago

BTW are the same Thomas Kahn over at YSEQ?

Jim "Griz" Adams

On Jan 30, 2017 8:18 AM, "Jim Adams" griz054@gmail.com wrote:

Thanks I'll look when I get home

Jim "Griz" Adams

On Jan 30, 2017 8:15 AM, "tkrahn" notifications@github.com wrote:

It seems like your samtools mpileup version doesn't understand the -f parameter. Which version of samtools are you using? You definiely need a htslib version of samtools from http://www.htslib.org/ The old versions of samtools had a slightly different syntax and are missing some features that are needed for the extract23.sh script. If you just type samtools (without parameters) and enter at the shell prompt, you should see:

thomas@streymoy:/genomes/1/2448$ samtools

Program: samtools (Tools for alignments in the SAM format) Version: 1.3.1-42-g0a15035 (using htslib 1.3.2-176-gafd9b56)

Usage: samtools [options] (...)

"using htslib" is what you need

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tkrahn/extract23/issues/1#issuecomment-276059729, or mute the thread https://github.com/notifications/unsubscribe-auth/AYPDKjc6DYBZj-bizsdgN3zNQuCf1MA1ks5rXeKJgaJpZM4LxAmO .

Griz054 commented 7 years ago

Krahn... My phone wants to auto correct that. I may just dump this Ubuntu build and put on the latest Debian build. The Ubuntu repository had older versions that I used first. I removed them used the newer versions you had listed but who knows what I left behind.

tkrahn commented 7 years ago

Yes, I work at YSEQ. Changing to Debian is certainly a useful step if you're an experienced Linux user. However it won't help to get the htslib version of samtools. In both distributions you still need to compile samtools from source. If you're familiar with Ubuntu, I recommend to stay in this distribution for now and just get your samtools environment up to date. Try this: Get your Ubuntu updated:

sudo apt-get update && apt-get upgrade
mkdir ~/src
cd ~/src

Then clone the development versions of samtools / bcftools / htslib from github

git clone git://github.com/samtools/samtools.git
git clone git://github.com/samtools/bcftools.git
git clone git://github.com/samtools/htslib.git

and follow the build instructions. (You may need to install several build dependencies but this is all explained in the instructions).

The installation will by default install /usr/local/bin/samtools which has priority over your installed old version from the Ubuntu distribution. So this won't conflict with your aptitude environment.

Let me know how far you get and ask for help if you hit a brickwall.

Griz054 commented 7 years ago

Did all that and it still didn't fix it. But I fixed it for myself. I didn't go through your entire script but I hard coded the file names for the GCHr37 fasta file and the output file at the beginning of your extract23.sh, dropped the -t and -o from my command line when I called the script and it worked just fine. I now have my dad's DNA on Gedmatch. He passed away in 2014, so I'm grateful this worked. Thank you Thomas.

Griz054 commented 7 years ago

One more question. I used the 1000 genomes fasta for the GCHr37 reference file. Is there a better one to use? If so, where would I find it? Thanks so much for this script. I may tinker with it a bit to add different templates.If I do that and get it working, I'll let you know. I'm also glad to see YSEQ adding all the new products. I recommend you guys all the time.

tkrahn commented 7 years ago

Griz054: The GRCh37 reference sequence from the 1000 genomes webserver should be fine: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz

You could also download it from EMBL: ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/ But then you need to concatenate the single chromosomes to one big file.

The hg19 reference sequence is available from UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz

tkrahn commented 7 years ago

napobo3: What version of samtools do you use? Try at the shell prompt:

samtools

Program: samtools (Tools for alignments in the SAM format)
Version: 1.2-2-gf8a6274 (using htslib 1.2.1-6-g94d13ce)

Usage:   samtools <command> [options]

Make sure the "using htslib" is really there.