ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
170 stars 62 forks source link

Retrieve Seed killed #189

Closed wyy0945 closed 1 year ago

wyy0945 commented 2 years ago

hello, I'm trying to assemble an animal's mitochondrial genome (Macrothele Yani). There is no reference sequence. I downloaded its COI gene from NCBI as the seed sequence, but it shows retrieve seed killed when running. Can you help me check what the problem is? This is my log: Input parameters from the configuration file: Verify if everything is correct

Project:

Project name = Y1001 Type = mito Genome range = 12000-20000 K-mer = 3 Max memory = Extended log = 0 Save assembled reads = no Seed Input = /home/gxl/Desktop/wyy/ok/seed.fasta Extend seed directly = no Reference sequence = Variance detection = Chloroplast sequence =

Dataset 1:

Read Length = 150 Insert size = 400 Platform = illumina Single/Paired = PE Combined reads = Forward reads = /home/gxl/Desktop/wyy/GPL202202743/1_RawData/1001_R1.fq.gz Reverse reads = /home/gxl/Desktop/wyy/GPL202202743/1_RawData/1001_R2.fq.gz Store Hash =

Heteroplasmy:

Heteroplasmy = HP exclude list = PCR-free =

Optional:

Insert size auto = yes Use Quality Scores = Output path = /home/gxl/Desktop/wyy/ok

Reading Input......OK

Building Hash Table......OK

Subsampled fraction: 100.00 %

Retrieve Seed...killed And this is my seed: I (COX1) ATTTGATTTTTGGAGTGTGATCCGCGATAGTAGGAACTGCTATAAGAGTAATTATCCGGATTGAATTAGG TCAAGTAGGAAGATTGTTAGGGGATGACCATCTTTATAATGTAATTGTAACAGCTCATGCTCTTGTAATA ATTTTTTTTATGGTGATGCCTATTTTGATTGGTGGATTTGGAAATTGGTTAGTTCCTTTAATATTAGGGG CTCCTGACATGGCTTTTCCTCGTATAAATAATTTAAGATTTTGGTTATTACCTCCTTCTTTATTTTTGCT TGTTCTATCTTCAATAACTGATAGAGGAGTTGGGGCTGGATGGACTATTTATCCCCCTCTCTCTTCAGGT CTTGGACATAGCGGGGGGGGAATGGATTTTGCTATTTTTTCTTTGCATTTAGCGGGAGCATCTTCAATTA TGGGTGCTGTAAATTTTATTTCTACAATTATTAATATGCGGGGAAAGGGAATAGTTATAGAACGGGTTCC TTTATTTGTGTGATCAGTGTTGATTACTGCAATTTTATTGTTGCTTTCTTTACCAGTGTTAGCCGGGGCT ATTACTATACTTTTAACGGATCGAAATTTTAATACTTCTTTTTTCGATCCTGCTGGGGGTGGAGATCCTA TTTTGTTTCAGCATTTATTTTGATTTTTTGGTCATCCGGAGGTCTATATTTTGATTTTACCAGGGTTTGG TATAATCTCTCATATTATTAGTTCGTCGGTAGGAAAGCGGGAACCATTTGGAACGTTAGGTATAATTTAT GCAATAGCTGGAATTGGAG

.

ndierckx commented 2 years ago

Hi,

If it gets killed, it usually means your system ran out of memory. Although i is weird that it happens during this step.. But I saw that you used a kmer of 3, that is way to low (I will add a warning for that), maybe pick 20 as kmer and try again...

wyy0945 commented 2 years ago

Hello, I tried to change the k-mer to 20, and the operation appeared “Building Hash Table……killed” Project: Project name = Y1001 Type = mito Genome range = 12000-20000 K-mer = 20 Max memory = Extended log = 0 Save assembled reads = no Seed Input = /home/gxl/Downloads/sequence1.fasta Extend seed directly = no Reference sequence = Variance detection = Chloroplast sequence =

Dataset 1: Read Length = 150 Insert size = 400 Platform = illumina Single/Paired = PE Combined reads = Forward reads = /home/gxl/Desktop/wyy/GPL202202743/1_RawData/1001_R1.fq.gz Reverse reads = /home/gxl/Desktop/wyy/GPL202202743/1_RawData/1001_R2.fq.gz Store Hash =

Heteroplasmy: Heteroplasmy = HP exclude list = PCR-free =

Optional: Insert size auto = yes Use Quality Scores = Output path = /home/gxl/Desktop/wyy/ok

Reading Input......OK

Building Hash Table……killed

ndierckx commented 2 years ago

If it gets killed during the building of the hash table, it is definitely shortage of memory.. There is a max memory option, I would use that, it will subsample your dataset for the amount of memory you give in the config file If your assembly would be unsuccessful because of low coverage, you will need a machine with more memory