Closed kfuku52 closed 1 year ago
Are your final assignments (>With a smaller number of assignments, SubPhaser successfully completed in the same genome with a smaller number of homologous chromosome assignments) correct?
Sorry, my writing was a little strange because I wrote the above comment just before I went to bed last night. SubPhaser was completed, but the final assignments didn't seem to be correct (given our data on the synteny and subgenome dominance of this decaploid plant). On the other hand, we have no evidence that this species is an allopolyploid, so the fact that SubPhaser does not phase well may mean autopolyploidy or that the ancestral species were very similar to each other.
For decaploid, it is hard to phase as there may be 2-5 subgenomes in fact and it maybe very complicated in nature. I have successfully phased upon to octoploid with 4 subgenomes. As there are no more priors, you may have to make more tries. However, your config file may be set like:
scaffold2 scaffold1 scaffold8 scaffold11 scaffold12
scaffold3 scaffold17 scaffold23 scaffold24 scaffold40
.....
And -nsg
can be set to control the number of subgenomes, and -baseline
can be set to compare with which chromsome to identify differential kmers. For example, for a decreasingly sorted kmer in (chr0 chr1 chr2 ... chr4)
, -baseline 1
will compare between chr0
and chr1
, -baseline 2
will compare between chr0
and chr2
, and -baseline -1
will compare between chr0
and chr4
, and so on.
Thank you, I started trying it. I thought the above error occurred because SubPhaser assigned all scaffolds to SG1, but I got the same error with -nsg 2
(to phase 1 dominant subgenome versus 4 recessive subgenomes) with which some scaffolds were assigned to SG2.
22-12-23 20:20:37 [INFO] Subgenome assignments: OrderedDict([('scaffold2', 'SG1'), ('scaffold1', 'SG1'), ('scaffold8', 'SG1'), ('scaffold11', 'SG1'), ('scaffold12', 'SG1'), ('scaffold3', 'SG1'), ('scaffold17', 'SG1'), ('scaffold23', 'SG1'), ('scaffold24', 'SG1'), ('scaffold40', 'SG1'), ('scaffold4', 'SG1'), ('scaffold22', 'SG1'), ('scaffold30', 'SG2'), ('scaffold33', 'SG2'), ('scaffold39', 'SG1'), ('scaffold5', 'SG1'), ('scaffold13', 'SG1'), ('scaffold16', 'SG1'), ('scaffold18', 'SG1'), ('scaffold26', 'SG1'), ('scaffold6', 'SG1'), ('scaffold15', 'SG1'), ('scaffold20', 'SG1'), ('scaffold32', 'SG1'), ('scaffold38', 'SG1'), ('scaffold7', 'SG1'), ('scaffold14', 'SG1'), ('scaffold27', 'SG1'), ('scaffold28', 'SG1'), ('scaffold29', 'SG1'), ('scaffold9', 'SG1'), ('scaffold19', 'SG1'), ('scaffold21', 'SG1'), ('scaffold34', 'SG1'), ('scaffold36', 'SG1'), ('scaffold10', 'SG1'), ('scaffold25', 'SG1'), ('scaffold31', 'SG1'), ('scaffold35', 'SG1'), ('scaffold37', 'SG1')])
.
.
.
22-12-23 20:43:39 [INFO] Output: /gfe_data/tmp/14_Nepenthes_gracilis/Nepenthes_gracilis.subphaser/Nepenthes_gracilis.k15_q200_f2.ltr.enrich
22-12-23 20:43:39 [INFO] 0 significant subgenome-specific LTR-RTs
22-12-23 20:43:39 [INFO] Summary of overall LTR insertion age (million years):
/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/opt/conda/envs/biotools/bin/subphaser", line 33, in <module>
sys.exit(load_entry_point('subphaser==1.2.5', 'console_scripts', 'subphaser')())
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 784, in main
pipeline.run()
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 518, in run
ltr_bedlines, enrich_ltr_bedlines = self.step_ltr(d_kmers) if not self.disable_ltr else ([],[])
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/__main__.py", line 602, in step_ltr
enrich_ltrs = LTR.plot_insert_age(ltrs, d_enriched, prefix,
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/LTR.py", line 515, in plot_insert_age
d_info = summary_ltr_time(d_data, fout)
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/subphaser-1.2.5-py3.9.egg/subphaser/LTR.py", line 601, in summary_ltr_time
np.median(xages), abs(np.percentile(xages, 2.5)), np.percentile(xages, 97.5)))
File "<__array_function__ internals>", line 180, in percentile
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4166, in percentile
return _quantile_unchecked(
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4424, in _quantile_unchecked
r, k = _ureduce(a,
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3725, in _ureduce
r = func(a, **kwargs)
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4593, in _quantile_ureduce_func
result = _quantile(arr,
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4699, in _quantile
take(arr, indices=-1, axis=DATA_AXIS)
File "<__array_function__ internals>", line 180, in take
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 190, in take
return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
File "/opt/conda/envs/biotools/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
return bound(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.
The error is because there are none subgenome-specific LTR-RTs identified (the phasing step maybe failed). -disable_ltr
can skip this step but is somewhat not meaningful. You may assume -nsg 5
.
Thank you. I tried -nsg 5
, but the phasing wasn't successful.
Yes, too few differential kmers.
Should I close this issue if the IndexError is intended?
OK. I will change it to "warning".
I am closing this issue as the bug seems to have been fixed. Thank you!
Hi, I got the following error with my dataset when I was trying to pre-assign all 40 chromosomes to 2 subgenomes. Apparently, SubPhaser re-assigned all chromosomes to SG1. With a smaller number of assignments, SubPhaser successfully completed in the same genome with a smaller number of homologous chromosome assignments, as you suggested in #7.