skoren / triobinningScripts

Scripts to reproduce TrioBinning manuscript
17 stars 5 forks source link

help -- parental input files fortmat obtained for trio binning assembly #2

Closed qiuyixmm closed 5 years ago

qiuyixmm commented 5 years ago

hello @skoren : i am an absolute beginner. i am planning to try canu(v.1.8) using my own dataset to do trio binning assembly. Provided that there are short-read sequencing pair-end files for parents (parternal_R1.fastq,parternal_R2.fastq and maternal_R1.fastq,maternal_R2.fastq), what should i do to get files parternal.fasta and maternal.fasta as input files of canu? according to your description below:

The input for k-mer counting must be fasta formatted. If you have a fastq file you can run zcat $fastq | awk -v name=$name 'BEGIN {num=0} {if (NR%2000000==1) {num+=1; print ">"name"."num} } {if (NR%4==2) print $1"N"}' > $name.fa

so i just need to convert these two parternal (maternal) fastq files to fasta format, and merger these two fasta file after converting into a single parternal (maternal) fasta file, right? it may be like this:

parternal_R1.fastq  -> parternal_R1.fasta
parternal_R2.fastq  -> parternal_R2.fasta
parternal_R1.fasta  parternal_R2.fasta ->  parternal.fasta
skoren commented 5 years ago

If you're using Canu 1.8, you don't need to do any re-formatting, that is only for this version of the scripts. See the docs for an example: https://canu.readthedocs.io/en/latest/quick-start.html#trio-binning-assembly

You can just provide the fastq files, something like:

-haplotypePaternal paternal_R*.fastq -haplotypeMaternal maternal_R*.fastq

to canu.