lbcb-sci / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads
MIT License
201 stars 34 forks source link

Polishing with racon using multiple input files? #70

Open BioLaFu opened 2 years ago

BioLaFu commented 2 years ago

Hi, I have got a reference genome which I want to polish with more than just one sequence file. Basically I've got a reference genome which was generated using PacBio and then I've got 50 whole genome sequences which were generated using Illumina. Now I want to polish the PacBio reference genome using the "Illumina data". I managed to polish the reference genome once, with just one Illumina data set. But when I go ahead and try to polish the resulting file with the next Illumina data set I get the following error: [racon::Window::add_layer] error: layer begin and end positions are invalid!

Is there a way to do what I want or is it simply not possible using racon?

I also thought about combining all of the Illumina sequences into one file, but that doesn't seem sensible, regarding I am working on snail genomes each about 1 Gb big....

Thanks in advance! Laura

rvaser commented 2 years ago

Hello Laura, unfortunately current API does not allow multiple files. You would need to combine them together and run Racon in one command. Also, if you have paired-ends they need to have unique names up to the first white space (you can preprocess the file before mapping with https://github.com/lbcb-sci/racon/blob/master/scripts/racon_preprocess.py).

Best regards, Robert

BioLaFu commented 2 years ago

Hello Robert,

Thanks so much for your reply! I am aware of the preprocessing step to get unique names for paired end data. So I will go ahead and try to combine several fastq files into one, preprocess that and then run racon as usual... Thanks again for your help. Laura