Closed senzhaocode closed 5 years ago
Greetings,
I ran your command line and successfully got the input variants in the output vcf:
#CHROM POS ID REF ALT QUAL FILTER INFO
1 1254057 . CA C . PASS WP=0/1
1 1254136 . G A . PASS WP=1/1
1 1254436 . A G . PASS WP=1/1
1 1254443 . G A . PASS WP=1/1
1 1254841 . C G . PASS WP=1/1
1 1255028 . G A . PASS WP=1/0
1 1255287 . C T . PASS WP=1/1
1 1255403 . TG T . PASS WP=1/1
1 1256608 . G A . PASS WP=0/1
1 1257593 . A G . PASS WP=1/1
1 1258246 . A C . PASS WP=1/1
1 1259429 . TG T . PASS WP=0/1
1 1259529 . G A . PASS WP=1/1
1 1259537 . G A . PASS WP=1/1
1 1259546 . G A . PASS WP=1/1
1 1259551 . A G . PASS WP=1/1
1 1259552 . G GGAGCTGGGGGGCTGGGGGGCTGAGGGGCTCGGGGGCTGGGGGGCTGCTGGGCTGAGAGGCTGGGAGACTGGA . PASS WP=1/1
1 1260168 . C A . PASS WP=1/1
1 1260733 . A C . PASS WP=1/1
1 1261282 . AC A . PASS WP=0/1
1 1261684 . TC T . PASS WP=0/1
1 1261824 . G C . PASS WP=1/1
The first 6 got skipped, because genReads doesn't like dots in the GT field. (I'm open to changing this behavior such that '.' gets treated at '0', though).
edit: I ran it without the --job parameter. I'll run it again to make sure that works too.
Hi Zstephens:
Thank you for the reply. I figured out why there was no output in Gold_VCF file, because chromosome names in reference fasta file and these in -v vcf files do not match each other. Now I used the input VCF file (with around 67000 variant records) for generating simulation reads, unfortunately there is an error report which causes the running process crashed:
**-------------------------------- reading input VCF... found 67298 valid variants in input vcf.
reading 22... 21.654 (sec) found 66870 valid variants for 22 in input VCF... 428 variants skipped...
sampling reads...
1% 2% Traceback (most recent call last):
File "genReads.py", line 743, in
It seems something related to SequenceContainer.py. I attached the reference fasta and input vcf file. Can you could try and see whether to recover the same problem. http://folk.uio.no/senz/chr22.fasta http://folk.uio.no/senz/chr22.vcf The command used for simulating reads: python genReads.py -r chr22.fasta -R 150 -c 10 -M 0 -v chr22.vcf -o ERR1341793_chr22 --bam --vcf --pe 354 73
Hi Zstephens:
I found the reason why genReads.py was crashed during reads simulation. Because the user-defined VCF input file was assembled from PicBio Technology. It contains some VERY long indel variants (>150 bp), which does not make sense for generating illumina paired-end reads. After I removed these long indel variants from original vcf file, the simulation process went well. Zstephens, I suggest that you add some codes in the script vcfFunc.py and make control of potential presence of long indel variants. That will avoid some filtering work done by users.
Again, neat-genreads is very nice simulator even if it has such a small shortage.
Hei Zstephens,
I am using neat-genreads to create some synthetic WGS dataset. Here, I apply the parameter -v to generate user-defined mutations via a vcf file, see my command below: _python genReads.py -r /usit/abel/u1/senz/CHM-eval.kit/human_g1k_v37_decoy.fasta -R 150 -c 41 -M 0 -v /usit/abel/u1/senz/neat-genreads/tmp.vcf -o /usit/abel/u1/senz/CHM1_CMH13/ERR1341793simulate --bam --vcf --no-fastq --pe 354 73 --job 1 20
Very strange! No output in the Gold_VCF file, only BAM file. There is no error report in log file. I attach my input VCF file, not sure the format is OK? Can you help me to figure out.
_##fileformat=VCFv4.2
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
FORMAT=
FILTER=
FILTER=
FILTER=
FILTER=
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT syndip
1 10927 . A G 30 GAP13 . GT 1/. 1 10931 . C T 30 GAP13 . GT 1/. 1 579576 . A AG 30 HET1;GAP13 . GT ./. 1 579732 . CA C 30 HET1;GAP13 . GT ./. 1 579846 . GA G 30 HET1;GAP13 . GT ./. 1 580069 . GA G 30 HET1;GAP13 . GT ./. 1 1254057 . CA C 30 . . GT 0/1 1 1254136 . G A 30 . . GT 1/1 1 1254436 . A G 30 . . GT 1/1 1 1254443 . G A 30 . . GT 1/1 1 1254841 . C G 30 . . GT 1/1 1 1255028 . G A 30 . . GT 1/0 1 1255287 . C T 30 . . GT 1/1 1 1255403 . TG T 30 . . GT 1/1 1 1256608 . G A 30 . . GT 0/1 1 1257593 . A G 30 . . GT 1/1 1 1258246 . A C 30 . . GT 1/1 1 1259429 . TG T 30 . . GT 0/1 1 1259529 . G A 30 . . GT 1/1 1 1259537 . G A 30 . . GT 1/1 1 1259546 . G A 30 . . GT 1/1 1 1259551 . A G 30 . . GT 1/1 1 1259552 . G GGAGCTGGGGGGCTGGGGGGCTGAGGGGCTCGGGGGCTGGGGGGCTGCTGGGCTGAGAGGCTGGGAGACTGGA 30 . . GT 1/1 1 1259559 . G T 30 . . GT 1/1 1 1259561 . A G 30 . . GT 1/1 1 1259569 . A G 30 . . GT 1/1 1 1260168 . C A 30 . . GT 1/1 1 1260733 . A C 30 . . GT 1/1 1 1261282 . AC A 30 . . GT 0/1 1 1261684 . TC T 30 . . GT 0/1 1 1261824 . G C 30 . . GT 1/1_
Thank you