wtsi-hpag / Scaff10X

Pipeline for scaffolding and breaking a genome assembly using 10x genomics linked-reads
MIT License
20 stars 3 forks source link

"Numbers of contigs are difference," error in "scaff_bwa-barcode" #22

Open DustinSokolowski opened 3 years ago

DustinSokolowski commented 3 years ago

Hey!

Thank you for your great tool.

I am trying to use some 10X linked reads to improve the contig assembly of a _de _novo genome completed with oxford nanopore long reads. I first aligned the 10X reads with longranger and then continued from there.

Command:

/hpf/tools/centos7/Scaff10X/4.1/src/scaff10x -nodes 25 -bam /hpf/largeprojects/mdwilson/dustin/new_genome/phase_link/SUB_2626M1/outs/possorted_bam.bam genome.fa male_output.fasta

The "genome.fa" is the genome fasta file produced in the "refdata-assembly/fasta/" file made from longranger mkref.

The error specifically was: Error running command: /hpf/tools/centos7/Scaff10X/4.1/src/scaff-bin/scaff_bwa-barcode tarseq.tag align0.dat align.dat > try.out

Try try-out file had this: 2751 409880285 Numbers of contigs: 2750 2751 Numbers of contigs are difference, please check reference assembly! 2750 2751

While not entirely sure what this meant, I did some digging and I noticed that one scaffold had 0 10-X reads aligning to it. Below is the summary of reads aligned per contig and the contig lacking reads. image

I also noticed that the contig itself is on the shorter side.

Together, I have the following questions: 1) Could the lack of alignment to a contig be responsible for this error? 2) should there be a contig length cutoff in the inputted assembly? 3) If this error is coming elsewhere, do you happen to know the source?

Thanks so much! Dustin