lpryszcz / pyScaf

Genome assembly scaffolding using information from paired-end/mate-pair libraries, long reads, and synteny to closely related species.
GNU General Public License v3.0
24 stars 11 forks source link

Less BUSCO genes after scaffolding. #5

Open a-velt opened 6 years ago

a-velt commented 6 years ago

Hi,

I would just like to make a return on the scaffolding of my assembly (Sanger technology) with PacBio reads (30x coverage), by using pyScaf.

pyScaf is fast and generates interesting results in the first place. I went from 2,059 scaffolds to 1,344 scaffolds, which was encouraging. Then I launched BUSCO on both assemblies and got the following results :

95.6% of complete BUSCO genes for my assembly (before pyScaf) and 78.7% of complete BUSCO genes after pyScaf. Before scaffolding, I have 37 missing genes, after pyScaf I have 284 missing genes.

I launched pyScaf with these parameters : pyScaf.py -f Scaffolds.fasta --identity 0.80 -o Scaffolds.pyScaf.fasta -t 10 --log pyScaf_run.log --longreads all_raw_reads.Pacbio.fasta

Maybe I have to change them ? Do you have any advice to me?

hgdarras commented 6 years ago

Hi, This is probably the same problem as the one mentioned in issue #3 :

Additionly, there might be some over-scaffolding that many contigs seemed with large overlap were linked directly (without any check such as whether the contigs overlapped actually).

In this example (.tsv output of a long read scaffolding run), a 2.4 Mb scaffold and a 3.3 Mb scaffold are merged into a 3.3 Mb scaffold. 2.4 Mb of non-redundant sequence is lost in the process.

scaffold00018 3324699 2 scaffold31_size2472606 scaffold20_size3324684 1 0 -3065490 0

a-velt commented 6 years ago

Hi !

Yes I found the problem ! I used OPERA to perform scaffolding of my Sanger assembly with PacBio reads and I saw that OPERA merged some contigs, generating this problem with BUSCO. As OPERA generates a file giving scaffolding information, I wrote a script to perform "manual" scaffolding without merging my contigs and it's perfect ! BUSCO is very good after that. If someone encounters such problems with OPERA, contact me and I will provide my script.

Thank you, Amandine

liguangshuo commented 2 years ago

Hi Amandine @a-velt

I face the same question now. Could you share your script with me?

Thanks in advances Guangshuo