zhangrengang / SubPhaser

Phase, partition and visualize subgenomes of a neoallopolyploid or hybrid based on the subgenome-specific repetitive kmers.
https://doi.org/10.1111/nph.18173
GNU General Public License v3.0
54 stars 12 forks source link

Suggestion for specific settings to improve subphasing #32

Open dabitz opened 5 months ago

dabitz commented 5 months ago

Hi Thanks a lot for this awesome tool!

I am trying to phase an allopentaploid genome which we expect to have 4 subgenomes. Although the clustering works very well, I having trouble to adjust the settings to get the fours subgenomes correctly identified. Suphaser identifies normally 3 subgenomes, but if I set -nsg 4 it does not identify correctly the 4th subgenome based on the clustering but it splits one subgenome wrongly. Please below.

Using -nsg 3: image

Using -nsg 4: image

Using only the set of chromosomes from S1/2 and s3 from the two subgnomes that should be split: image

Ideally, I would like to have in one run the 4 subgenomes correctly identified and split. Any suggestions are welcome! Best André

zhangrengang commented 5 months ago

As you have got a well split of 4 subgenomes, you can finally input the assignments of 4 subgenomes via -sg_assigned. This option will use the assignments directly, but not re-assign subgenomes. The input is in the format of the previous output *.chrom-subgenome.tsv, like:

#chrom  subgenome
3_R4 R4
...
7_R5 R5
...
2_s3 s3
...
3_s1 s1/2
...
dabitz commented 5 months ago

Thanks a lot! It just worked perfect! Is there a way I can specify the color to each subgenome? You should receive a nobel for that tool. So awesome!

dabitz commented 5 months ago

just found the right commands, please ignore the message above, but the nobel thing still holds true :-)

dabitz commented 4 months ago

After using the hex color parameters and assigned subgenome options the circos plot changed a lot. Any way to fix or improve this? In particular the purple SG is not evident after using the hex and sg_assigned parameters

with command: subphaser -i $REF -c config_chr.txt -pre ref -p 20 -cleanup image

with command: subphaser -i $REF -c config_chr.txt -pre 4sg_hex -p 20 -cleanup -sg_assigned assigned.txt -colors "#f37d38,#f9a64a,#452f91,#9097cb" image

zhangrengang commented 4 months ago

It should be because S1/S2 have no speciefic kmers, and for s3, speciefic kmers distributed unevenly. At present, there may be no simple options to work well. You may try again to input one combination of two or three subgenomes every time, and check whether the results are reiable.

dabitz commented 4 months ago

Thanks, indeed removing S2 chromosomes improve a lot. In fact, S2 chromosomes are only homologs of S1 and not truly another different subgenome...

image