zhangrengang / SubPhaser

To phase, partition and visualize subgenomes of a neoallopolyploid or hybrid based on the subgenome-specific repetitive kmers.
https://doi.org/10.1111/nph.18173
GNU General Public License v3.0
55 stars 12 forks source link

Division by zero when trying to build trees? #5

Open mrmrwinter opened 2 years ago

mrmrwinter commented 2 years ago

Hi, when running SubPhaser i get the following error:

Traceback (most recent call last):
  File "/home/531734/.conda/envs/SubPhaser/bin/subphaser", line 33, in <module>
    sys.exit(load_entry_point('subphaser==1.2.5', 'console_scripts', 'subphaser')())
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 779, in main
    pipeline.run()
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 516, in run
    ltr_bedlines, enrich_ltr_bedlines = self.step_ltr(d_kmers) if not self.disable_ltr else ([],[])
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 615, in step_ltr
    d_files = tree.build(job_args=job_args)
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/LTR.py", line 210, in build
    ncpus = [max(1, int(self.ncpu*v/tprop)) for v in prop]
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/LTR.py", line 210, in <listcomp>
    ncpus = [max(1, int(self.ncpu*v/tprop)) for v in prop]
ZeroDivisionError: division by zero

I believe this could be because only one scaffold was identified as a subgenome. Does this sound possible?

Many thanks

Mike

zhangrengang commented 2 years ago

Yes, it is because preceding subgenome assignment is failed. Can you post the clustering heatmap?

zhangrengang commented 2 years ago

@mrmrwinter The immediate cause should be that there is no subgenome-specific LTR for building tree. You can skip this step by -disable_ltrtree. And check the results according to the Supplementary Material:

On the clustering heatmap (Fig. 2B) and PCA plot (Fig. 2C), a subgenome is defined as well-phased if it has clearly distinguishable patterns of both differential k-mers and homeologous chromosomes, indicating that each subgenome shares subgenome-specific features as expected. If the subgenomes are not well-phased, the following analyses are meaningless and should be ignored.

mrmrwinter commented 2 years ago

Hi @zhangrengang, thanks for the fast reply. I had a look at the subgenome assignment results and only one scaffold was assigned to subgenome 2.

image

I double checked the content of this scaffold and it is definitely not a contaminant or particularly difficult region - it is around 120 Kbp and contains a ribosomal subunit. When i remove this scaffold from the assembly and from the configureation file i get the following error:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/Jellyfish.py", line 641, in _filter_kmer
    return kmer, [c/l for c,l in zip(counts, lengths)], tot
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/Jellyfish.py", line 641, in <listcomp>
    return kmer, [c/l for c,l in zip(counts, lengths)], tot
ZeroDivisionError: division by zero
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/531734/.conda/envs/SubPhaser/bin/subphaser", line 33, in <module>
    sys.exit(load_entry_point('subphaser==1.2.5', 'console_scripts', 'subphaser')())
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 779, in main
    pipeline.run()
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 412, in run
    d_mat = dumps.filter(d_mat, lengths, self.sgs, outfig=histfig, #d_targets=d_targets, 
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/Jellyfish.py", line 487, in filter
    for kmer, freqs, tot_freq in pool_func(_filter_kmer, args, self.ncpu, 
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/RunCmdsMP.py", line 346, in pool_func
    for returned in pool_map(func, iterable, **kargs):
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
ZeroDivisionError: division by zero

I really hope i can get this working as the software looks very promising.

Many thanks

zhangrengang commented 2 years ago

@mrmrwinter How about the plot *.kmer.mat.pdf? Your genome may be not a neoallopolyploid. Do you have a dot plot to show the homoeologs? By the way, you do not need to remove the seqence from the assembly.

mrmrwinter commented 2 years ago

@zhangrengang Here is the plot you requested: image

Though definitely allopolyploid, im not sure how recently the hybridisation would have been, but certainly nearer the half a million year mark than last month. This species is also a mitotic parthenogen, with potential for homeologous recombination. I do not have a dotplot showing the dotplots but i have CDS homology data shown in the circos plot below, and links in the chromatin contact map between suspected pairings.

image

image

Hope this helps.

zhangrengang commented 2 years ago

@mrmrwinter According to the heatmap and PCA plot, Subphaser is failed to your assembly. From the Hi-C map, there are many assembly errors. Some contigs are mis-joined into one chromosome and some contigs that belong to the same chromosome are placed into different chromosomes. These massive errors will disrupt SubPhaser. You need to reassembly your chromosomes firstly. And the homoelogous relationships in the circos plot are not clear (maybe due to the assembly errors). After reassembly, if it is still not clear, you may need to make a dot plot (to show clear relationships) and Ks plot (to show divergent time) using MCSCAN and/or wgdi. According to the Hi-C map, it is probablely a neopolyploid.

mrmrwinter commented 2 years ago

Thankyou for your feedback. The HiC scaffolding was performed by a third party company so i had little control over the process. I will get in touch with them and see what has happened. What things indicate assembly errors? This is their second attempt at Hi-c scaffolding.

zhangrengang commented 2 years ago

360截图17060223645074 @mrmrwinter Three cases as shown in arrows: the green indicates clear boundary between chromosoms that should be split; the yellow indicates contacts that should be joined; and the blue indicates signals that may be homoelogs.

mrmrwinter commented 2 years ago

@zhangrengang thankyou for taking the time out to address this for me. I had put my faith in the third party but some things have clearly been overlooked. I will fix these things and rerun my analyses. Hopefully it will clear up the multiple hits in the CDS pairings, and allow SubPhaser to run correctly. Please could you recommend any resources for learning to interpret contact maps? I read all i could but still did not feel i could confidently callout features like misassemblies etc.

Again thankyou @zhangrengang!

zhangrengang commented 2 years ago

@mrmrwinter You can have a look on this paper:

Dudchenko O, Batra S S, Omer A D et. al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds [J]. Science, 2017, 356 (6333): 92

And if possible, make a practice on Hi-C assembly.

mrmrwinter commented 2 years ago

Many thanks @zhangrengang. I will hopefully be in touch when ive fixed these things and have SUbPhaser running!

zhangrengang commented 2 years ago

@mrmrwinter OK. Looking forward to your feedback.