Open cheninouc opened 1 day ago
In addition, the P19_clean.fq.gz and P48_clean.fq.gz file is a direct combination of the two-ended sequencing reads:
zcat P19_clean_1.fq.gz P19_clean_2.fq.gz | gzip - > P19_clean.fq.gz
zcat P48_clean_1.fq.gz P48_clean_2.fq.gz | gzip - > P48_clean.fq.gz
The unknown reads are those without any marker k-mers. It's possible your parents sequencing is too spare or the haplotypes are too closely related. Have you looked at the F1 stats in genomescope and the parental marker counts with merqury? Post the *.out
and *.log
files from your run.
The contents of the two files are as follows:
splitHaplotype.000001.out:
Found perl:
/public/home/chcg/anaconda3/envs/mamba/envs/canu/bin/perl
This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-thread-multi
Found java:
/public/home/chcg/anaconda3/envs/mamba/envs/canu/bin/java
openjdk version "11.0.13" 2021-10-19
Found canu:
/public/home/chcg/anaconda3/envs/mamba/envs/canu/bin/canu
canu 2.2
Running job 1 based on command line options.
--
-- Loading haplotype data, using up to 6 GB memory for each.
--
For 626 distinct 20-mers (with 6 bits used for indexing and 34 bits for tags):
0.000 GB memory for kmer indices - 64 elements 64 bits wide)
0.000 GB memory for kmer tags - 626 elements 34 bits wide)
0.000 GB memory for kmer values - 626 elements 12 bits wide)
0.000 GB memory
Will load 626 kmers. Skipping 256996778 (too low) and 0 (too high) kmers.
Allocating space for 16754 suffixes of 34 bits each -> 569636 bits (0.000 GB) in blocks of 32.000 MB
16754 values of 12 bits each -> 201048 bits (0.000 GB) in blocks of 32.000 MB
Loaded 626 kmers. Skipped 256996778 (too low) and 0 (too high) kmers.
-- loaded 626 kmers.
For 1687 distinct 20-mers (with 6 bits used for indexing and 34 bits for tags):
0.000 GB memory for kmer indices - 64 elements 64 bits wide)
0.000 GB memory for kmer tags - 1687 elements 34 bits wide)
0.000 GB memory for kmer values - 1687 elements 14 bits wide)
0.000 GB memory
Will load 1687 kmers. Skipping 388173722 (too low) and 0 (too high) kmers.
Allocating space for 17815 suffixes of 34 bits each -> 605710 bits (0.000 GB) in blocks of 32.000 MB
17815 values of 14 bits each -> 249410 bits (0.000 GB) in blocks of 32.000 MB
Loaded 1687 kmers. Skipped 388173722 (too low) and 0 (too high) kmers.
-- loaded 1687 kmers.
-- Data loaded.
--
-- Processing reads in batches of 100 reads each.
--
-- Bye.
haplotype.log:
-- Haplotype './0-kmers/haplotype-P19.meryl':
-- use kmers with frequency at least 1009.
-- Haplotype './0-kmers/haplotype-P48.meryl':
-- use kmers with frequency at least 998.
-- Begin processing file /public/home/chcg/dowload/BC202408553/BC202408553-ONT-ul-1samples/kw1-1M/pass.all.fq.gz
-- Finished processing file /public/home/chcg/dowload/BC202408553/BC202408553-ONT-ul-1samples/kw1-1M/pass.all.fq.gz with 458589 records
--
-- 1907 reads 85737412 bases written to haplotype file ./haplotype-P19.fasta.gz.
-- 3060 reads 137291711 bases written to haplotype file ./haplotype-P48.fasta.gz.
-- 441892 reads 12043284403 bases written to haplotype file ./haplotype-unknown.fasta.gz.
--
-- 11730 reads 9895948 bases filtered for being too short.
Hi Thanks for developing this tool. I now have resequencing data for two parents: P19_clean_1.fq.gz,P19_clean_2.fq.gz,P48_clean_1.fq.gz,P48_clean_2.fq.gz, and the offspring ONT (R10.4.1) data: F1.fq.gz, I want to use triocanu for haplotype assembly:
I got the result of splitHaplotype:
What is the reason for a large number of sequences that cannot distinguish haplotypes? Because my species has a relatively high heterozygosity, the parental resequencing data is only 10-15X, is it the reason for the low parental data?
Thanks in advance.