Open mrmrwinter opened 2 years ago
Yes, it is because preceding subgenome assignment is failed. Can you post the clustering heatmap?
@mrmrwinter The immediate cause should be that there is no subgenome-specific LTR for building tree. You can skip this step by -disable_ltrtree
. And check the results according to the Supplementary Material:
On the clustering heatmap (Fig. 2B) and PCA plot (Fig. 2C), a subgenome is defined as well-phased if it has clearly distinguishable patterns of both differential k-mers and homeologous chromosomes, indicating that each subgenome shares subgenome-specific features as expected. If the subgenomes are not well-phased, the following analyses are meaningless and should be ignored.
Hi @zhangrengang, thanks for the fast reply. I had a look at the subgenome assignment results and only one scaffold was assigned to subgenome 2.
I double checked the content of this scaffold and it is definitely not a contaminant or particularly difficult region - it is around 120 Kbp and contains a ribosomal subunit. When i remove this scaffold from the assembly and from the configureation file i get the following error:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/Jellyfish.py", line 641, in _filter_kmer
return kmer, [c/l for c,l in zip(counts, lengths)], tot
File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/Jellyfish.py", line 641, in <listcomp>
return kmer, [c/l for c,l in zip(counts, lengths)], tot
ZeroDivisionError: division by zero
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/531734/.conda/envs/SubPhaser/bin/subphaser", line 33, in <module>
sys.exit(load_entry_point('subphaser==1.2.5', 'console_scripts', 'subphaser')())
File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 779, in main
pipeline.run()
File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 412, in run
d_mat = dumps.filter(d_mat, lengths, self.sgs, outfig=histfig, #d_targets=d_targets,
File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/Jellyfish.py", line 487, in filter
for kmer, freqs, tot_freq in pool_func(_filter_kmer, args, self.ncpu,
File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/RunCmdsMP.py", line 346, in pool_func
for returned in pool_map(func, iterable, **kargs):
File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
ZeroDivisionError: division by zero
I really hope i can get this working as the software looks very promising.
Many thanks
@mrmrwinter How about the plot *.kmer.mat.pdf
? Your genome may be not a neoallopolyploid. Do you have a dot plot to show the homoeologs?
By the way, you do not need to remove the seqence from the assembly.
@zhangrengang Here is the plot you requested:
Though definitely allopolyploid, im not sure how recently the hybridisation would have been, but certainly nearer the half a million year mark than last month. This species is also a mitotic parthenogen, with potential for homeologous recombination. I do not have a dotplot showing the dotplots but i have CDS homology data shown in the circos plot below, and links in the chromatin contact map between suspected pairings.
Hope this helps.
@mrmrwinter According to the heatmap and PCA plot, Subphaser is failed to your assembly. From the Hi-C map, there are many assembly errors. Some contigs are mis-joined into one chromosome and some contigs that belong to the same chromosome are placed into different chromosomes. These massive errors will disrupt SubPhaser. You need to reassembly your chromosomes firstly. And the homoelogous relationships in the circos plot are not clear (maybe due to the assembly errors). After reassembly, if it is still not clear, you may need to make a dot plot (to show clear relationships) and Ks plot (to show divergent time) using MCSCAN and/or wgdi. According to the Hi-C map, it is probablely a neopolyploid.
Thankyou for your feedback. The HiC scaffolding was performed by a third party company so i had little control over the process. I will get in touch with them and see what has happened. What things indicate assembly errors? This is their second attempt at Hi-c scaffolding.
@mrmrwinter Three cases as shown in arrows: the green indicates clear boundary between chromosoms that should be split; the yellow indicates contacts that should be joined; and the blue indicates signals that may be homoelogs.
@zhangrengang thankyou for taking the time out to address this for me. I had put my faith in the third party but some things have clearly been overlooked. I will fix these things and rerun my analyses. Hopefully it will clear up the multiple hits in the CDS pairings, and allow SubPhaser to run correctly. Please could you recommend any resources for learning to interpret contact maps? I read all i could but still did not feel i could confidently callout features like misassemblies etc.
Again thankyou @zhangrengang!
@mrmrwinter You can have a look on this paper:
Dudchenko O, Batra S S, Omer A D et. al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds [J]. Science, 2017, 356 (6333): 92
And if possible, make a practice on Hi-C assembly.
Many thanks @zhangrengang. I will hopefully be in touch when ive fixed these things and have SUbPhaser running!
@mrmrwinter OK. Looking forward to your feedback.
Hi, when running SubPhaser i get the following error:
I believe this could be because only one scaffold was identified as a subgenome. Does this sound possible?
Many thanks
Mike