mapleforest / HaploMerger2

43 stars 6 forks source link

Loss of sequences after Haplomerger2. #27

Open a-velt opened 5 years ago

a-velt commented 5 years ago

Hi mapleforest,

First of all, thank you for developing this great tool! I have a question about my run. I have some sequences that are flagged as xfSc because they are unpaired, but as I understand it, in step B4, they have to go back to my assembly. This is the case for the majority of large sequences, and I have the impression that the small sequences are not put back in the assembly, it does not matter for me, it does a cleaning ! My problem is that I have a sequence of more than 1 megabases that does not come back in the assembly. But this sequence exists on my reference assembly and I want to keep it. What are the possible reasons why HM2 does not recover it?

Thank you for your help, Amandine

mapleforest commented 4 years ago

all scaffolds must be in either the hm.new_scaffolds and hm.unpaired. But if you ran hm.batchB4.refine_unpaired_sequences, then it created hm.unpaired_updated and some scaffold could be threw away because various cireteria (like high similarity, too many Ns, repeats, etc). But you can always modify these criteria in the B4 script.