jtlovell / GENESPACE

Other
180 stars 24 forks source link

Question about filtering Diamond results #128

Closed Hannah1746 closed 9 months ago

Hannah1746 commented 10 months ago

Hello,

I am making a GENESPACE plot across 5 genomes that all shared a common whole genome duplication. I am getting some artifacts of synteny between the homoeologous chromosomes between the genomes:

Screen Shot 2023-10-19 at 12 20 53 PM

One of the best examples in this photo is between Myx14 and My13 to Cac14 and Cac11 I am hoping that I could maybe have a way to filter the Diamond results in attempt to filter these artifacts out. I understand there is some internal filtering that GENESPACE is already doing but I can't seem to isolate where that step is in the process.

Ether way I would love to hear your thought on how to deal with this without just brut forcing it.

jtlovell commented 10 months ago

to be clear, you are using default specifications with ploidy = 1 and no outgroup? If so, I am surprised that you get so much subgenome pollution.

Hannah1746 commented 10 months ago

Yes. I am runnning: gpar <- init_genespace( wd = wd, path2mcscanx = path2mcscanx, genomeIDs = c("MX", "RBS", "HN1", "MO1", "CC5"), ploidy = 1 )

and then:

out <- run_genespace(gpar, overwrite = T)

jtlovell commented 10 months ago

Interesting - the only time that I have seen this type of pattern is when the WGD happened just before the genomes diverged ... this can confuse things. I would say to try to increase the block size init_genespace(..., blkSize = 20) ... i that doesn't do it, reach out to me directly (bluesky = @jotlovell, email, etc.) and we'll get it figured out.

Hannah1746 commented 10 months ago

This defiantly helped but there are still some artifacts.

Screen Shot 2023-10-23 at 10 17 27 AM
jtlovell commented 9 months ago

yeah, I think the dup is likely close to the root of the tree.