Open HaoqingDu opened 1 month ago
Hi @HaoqingDu ,
/home/hdu/miniconda3/envs/mitoz_env/lib/python3.8/site-packages/mitoz/assemble/script/mitoAssemble/mitoAssemble all -K 51 -o PhoSp1_trimmed.mitoAssemble.K51 -s PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib -p 8
Could you please have a look at the file: PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib
and check if everything inside is okay?
Hi @linzhi2013
Thanks for reply. Here is the lines in the lib file
max_rd_len=151
[LIB]
avg_ins=250
reverse_seq=0
asm_flags=3
map_len=32
q=/home/hdu/Documents/firefly_genomes/results/mitoz/clean_data/PhoSp1_trimmed.clean_R1.fq.gz
I also compared it with the lib file for the test data, below is the test data lib file
max_rd_len=150
[LIB]
avg_ins=250
reverse_seq=0
asm_flags=3
map_len=32
q=/home/hdu/Documents/firefly_genomes/data/mitoz_test/test.R1.fq.gz
They look alike, so I assume this may not be the reason?
And the difference between my run and the successful test run is, at this step, inside the K51 directory /home/hdu/Documents/firefly_genomes/results/mitoz/mt_assembly/mitoAssemble/K51
, there is only this lib file, no other files. Is this normal?
check the content of /home/hdu/Documents/firefly_genomes/results/mitoz/clean_data/PhoSp1_trimmed.clean_R1.fq.gz
and check if it is broken:
gzip -t /home/hdu/Documents/firefly_genomes/results/mitoz/clean_data/PhoSp1_trimmed.clean_R1.fq.gz
You can also run the following command directly and check what will happen:
/home/hdu/miniconda3/envs/mitoz_env/lib/python3.8/site-packages/mitoz/assemble/script/mitoAssemble/mitoAssemble all -K 51 -o PhoSp1_trimmed.mitoAssemble.K51 -s PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib -p 8
Hi,
I just tried both.
First, gzip -t
returns nothing, which means the file is complete, right?
Second, when I run mitoAssemble, same thing happens
(mitoz_env) hdu@wla110:~/Documents/firefly_genomes/results/mitoz/mt_assembly/mitoAssemble/K51> /home/hdu/miniconda3/envs/mitoz_env/lib/python3.8/site-packages/mitoz/assemble/script/mitoAssemble/mitoAssemble all -K 51 -o PhoSp1_trimmed.mitoAssemble.K51 -s PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib -p 8
mitoAssemble is based on SoapTrans v1.03 and modified to fit mitogenome assembly by BGI-ENVers
pregraph -s PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib -K 51 -p 8 -o PhoSp1_trimmed.mitoAssemble.K51
In PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib, 1 libs, max seq len 151, max name len 256
8 thread created
read from file:
/home/hdu/Documents/firefly_genomes/results/mitoz/clean_data/PhoSp1_trimmed.clean_R1.fq.gz
--- 100000000th reads
--- 200000000th reads
--- 300000000th reads
--- 400000000th reads
--- 500000000th reads
--- 600000000th reads
--- 700000000th reads
--- 800000000th reads
--- 900000000th reads
--- 1000000000th reads
--- 1100000000th reads
--- 1200000000th reads
--- 1300000000th reads
--- 1400000000th reads
--- 1500000000th reads
--- 1600000000th reads
--- 1700000000th reads
--- 1800000000th reads
1800000000 * 150 = 270,000,000,000 bp
You are using 270 Gbp for mitochondrial genome assembly, which is too much, the program cannot handle such a big data.
Try to use 3-5 Gbp data instead.
The raw reads data (fq.gz format) is 30 Gb. I trimmed it with trimmomatic. So the input file fed to Mitoz is only 13 Mb large and contains around 37k reads.
I also tried to run mitoz all with --data_size_for_mt_assembly 3,0
to subsample the data if it were too large. But the same thing happened in read from file
.
The raw reads data (fq.gz format) is 30 Gb. I trimmed it with trimmomatic. So the input file fed to Mitoz is only 13 Mb large and contains around 37k reads.
But the log says
read from file:
/home/hdu/Documents/firefly_genomes/results/mitoz/clean_data/PhoSp1_trimmed.clean_R1.fq.gz
....
--- 1,800,000,000th reads
which is 1,800 M reads.
I also tried to run mitoz all with --data_size_for_mt_assembly 3,0 to subsample the data if it were too large. But the same thing happened in read from file.
I have no idea what happened there ...
I tried to run with megahit.
mitoz all \
--outprefix PhoSp1_trimmed \
--clade Arthropoda \
--genetic_code 5 \
--species_name "Photinus sp1" \
--fq1 PhoSp1_trimmed.fastq.gz \
--fastq_read_length 151 \
--data_size_for_mt_assembly 3,0 \
--assembler megahit \
--kmers_megahit 51 \
--requiring_taxa Arthropoda \
--min_abundance 0
The good thing is mitoz generated the mitogenome. So I guess the issue may result from mitoAssembler.
But Another thing is the generated genome is only 2450bp, super short. So my data might be not good enough either. However, it seems to be contradictory, because from the short genome I would say my data doesn't have enough reads, while mitoAssembler was loading infinite number of reads from my data.
For megahit, please make use of its multi-kmer feature, i.e., to use several kmers at the same time, which might be helpful.
In your code, you used only a single kmer, --kmers_megahit 51
.
Thanks for the advice. Using multiple kmers does help! However, the generated mito-genome is still less than 5k bp. So I really need to improve my data.
Perhaps try small kmer sizes like 31, 35, 39 etc
I installed mitoz v3.6 using conda. The test run was ok. I wanted to generate mitogenome from a SE data. Below is my command,
In the assemble part, read_from_file seems to never stop unless I kill it. Below is the related log info,
I don't know how to solve it.