schmeing / gapless

Gapless provides combined scaffolding, gap-closing and assembly correction with long reads
MIT License
32 stars 4 forks source link

gapless on a small region of a mammalian genome #15

Open avilella opened 11 months ago

avilella commented 11 months ago

Hi,

We've curated a region of about 2Mb of a mammalian genome but there is still a gap in it, of unknown length, which could be 200kb or longer.

The input current assembly is a single fasta sequence with a string on NNNs where the gap is located.

We have PacBio HiFi reads for the genome, and I am attempting to use gapless to fill in this gap in this locus.

I have 10 fastq.gz files, so I created a named pipe so it looks like there is only 1 fastq file:

mkfifo my_named_pipe
zcat file1.gz file2.gz ... fileN.gz > yourfile.txt < my_named_pipe
cat file1.gz > my_named_pipe
cat file2.gz > my_named_pipe
# Continue with other files...

Then I am running gapless like this:

bash -x gapless.sh -j 30 -i Cat_IgH_20230613_AM.fa.gz -t pb_hifi input.fastq

It fails at the extend step:

++ seqtk subseq /data2/assembly_work/input.fastq pass1/gapless_extending_reads.lst
+ minimap2 -t 30 -x asm20 --min-occ-floor=0 -X -m100 -g10000 --max-chain-skip 25 /dev/fd/63 /dev/fd/62
++ seqtk subseq /data2/assembly_work/input.fastq pass1/gapless_extending_reads.lst
+ gapless.py extend -p pass1/gapless pass1/gapless_extending_reads.paf
+ '[' -f pass1/gapless_extending_reads.paf ']'
+ rm -f pass1/gapless_extended_scaffold_paths.csv
+ '[' '!' -f pass1/gapless_extended_scaffold_paths.csv ']'
+ echo 'pipeline crashed: extend'
pipeline crashed: extend
+ exit 1

What parameters could I tune to obtain a decent result? Thanks.