Open dabitz opened 5 months ago
As you have got a well split of 4 subgenomes, you can finally input the assignments of 4 subgenomes via -sg_assigned
. This option will use the assignments directly, but not re-assign subgenomes. The input is in the format of the previous output *.chrom-subgenome.tsv
, like:
#chrom subgenome
3_R4 R4
...
7_R5 R5
...
2_s3 s3
...
3_s1 s1/2
...
Thanks a lot! It just worked perfect! Is there a way I can specify the color to each subgenome? You should receive a nobel for that tool. So awesome!
just found the right commands, please ignore the message above, but the nobel thing still holds true :-)
After using the hex color parameters and assigned subgenome options the circos plot changed a lot. Any way to fix or improve this? In particular the purple SG is not evident after using the hex and sg_assigned parameters
with command: subphaser -i $REF -c config_chr.txt -pre ref -p 20 -cleanup
with command: subphaser -i $REF -c config_chr.txt -pre 4sg_hex -p 20 -cleanup -sg_assigned assigned.txt -colors "#f37d38,#f9a64a,#452f91,#9097cb"
It should be because S1/S2 have no speciefic kmers, and for s3, speciefic kmers distributed unevenly. At present, there may be no simple options to work well. You may try again to input one combination of two or three subgenomes every time, and check whether the results are reiable.
Thanks, indeed removing S2 chromosomes improve a lot. In fact, S2 chromosomes are only homologs of S1 and not truly another different subgenome...
Hi Thanks a lot for this awesome tool!
I am trying to phase an allopentaploid genome which we expect to have 4 subgenomes. Although the clustering works very well, I having trouble to adjust the settings to get the fours subgenomes correctly identified. Suphaser identifies normally 3 subgenomes, but if I set -nsg 4 it does not identify correctly the 4th subgenome based on the clustering but it splits one subgenome wrongly. Please below.
Using -nsg 3:
Using -nsg 4:
Using only the set of chromosomes from S1/2 and s3 from the two subgnomes that should be split:
Ideally, I would like to have in one run the 4 subgenomes correctly identified and split. Any suggestions are welcome! Best André