Open DayTimeMouse opened 1 month ago
Hi @DayTimeMouse,
One fundamental question is, how confident you are with your hap1.fa and hap2.fa. Looks like your hap1.meryl and hap2.meryl are reflecting what's unique in each assembly, not the biologically inherited parental specific kmers.
Anyways, you could try meryl-lookup -existence -sequence hifi_reads.fq.gz -mers hap1.meryl hap2.meryl > hifi_reads.hapmer.count
.
This will provide count of kmers shared (found) in each read sequence.
Small explanation from the help message:
...
The output file has format:
<sequence_name> <tab> <kmers_in_sequence> <tab> <kmers_in_db> <tab> <kmers_shared> [...]
With one input database:
sequence1 <tab> 8415 <tab> 12856825 <tab> 8145
With two input databases:
sequence1 <tab> 8415 <tab> 12856825 <tab> 8145 <tab> 575757256 <tab> 8354
You could bin the reads based on the hap1 and hap2 mers found in each fastq sequence. Usually, it's expected to find kmers from one haplotype and a few from the other haplotype, due to possible noises in the reads. There may be reads with no hapmers if the two haplotypes share a large homozygous block. If there are any switch errors in the assembly, you'd see reads with substantial kmers with both hap kmers.
Alternate option is to use trio-binning implemented in Canu, but this might fit better if you are actually using hapmers inferred from the parental genomes as it normalizes based on the kmers found in the dbs.
/path/to/canu/build/bin/splitHaplotype \
-cl 1000 -memory 48 -threads 24 \
-H ../hap1.meryl 1 hap1.fasta.gz \
-H ../hap2.meryl 1 hap2.fasta.gz \
-A ./unk.fasta.gz \
hifi_reads.fq.gz
-cl 1000
will filter out reads < 1kb.
But I do not have real parental genomes, I used hifiasm(HiFi + ONT +HiC) to get hap1.fa and hap2.fa. In this case, do you still recommand me to use Canu(trio-binning splitHaplotype)? Canu(trio-binning) and count hapmers to partition reads via meryl-lookup, which method is fit better for me?
Then using meryl-lookup
might work the best. Use meryl v1.4.1 by the way, there were some improvements / fixes made particularly in meryl-lookup.
Many thanks!
Hi,
I have another question. How to get haplotype-specific HiC reads(paired-end)? Using haplotype-specific kmers?
Best regards.
Hi,
Thanks for developing this awesome tools. I'd like to get haplotype-specific HiFi reads via haplotype-specific kmers.
I run script to get haplotype-specific kmers, meryl version is 1.3:
meryl count k=21 hap1.fa output hap1.meryl
meryl count k=21 hap2.fa output hap2.meryl
meryl difference hap1.meryl hap2.meryl output hap1.only.meryl
meryl difference hap2.meryl hap1.meryl output hap2.only.meryl
meryl intersect hap1.meryl hap2.meryl output hap1_hap2_intersect.meryl
get read-meryl:
meryl count k=21 hifi_reads.fq.gz output hifi.meryl
I want to partition HiFi reads(each read) into two haplotypes. Now, I do not know how to do next? meryl can do this? Or, do you have some ideas for this?
Thanks in advance.