Closed EmilyPhelps closed 1 month ago
Hi, thanks for your interest in using wgd v2
. You usage with wgd dmd
and wgd ksd
is correct. The log information also looks normal. The warning for the family with stripped alignment length as 0 is not an error but a common feature for large families (for instance, >200 gene members) that after removing all the gap-containing columns (any column with '-') there is no column left and it's thus impossible to calculate Ks. If you want to calculate the Ks values for these large families anyhow, you may add the flag option --pairwise
in wgd ksd
command to calculate Ks values based on local pairwise alignment of each pair rather than of the whole family, which is expected to be less gappy. If your run with wgd ksd
stopped unexpectedly, the first thing I suggest you to check is whether you gave sufficient memory to each thread you assigned.
Thanks for the response- How would I go about checking if the memory is limiting the process? Even when I use very few threads it stops. I have 126G of memory and 64 cores, but the most I've used for this is 40, the fewest is 4 and it stops each time.
Hi - Just to provide more information here. I've been trying to run a subset of ~3K sequences on 4 nodes now for about 16 hours and it seem to get stuck.
It looks like its still running but when I check the htop there is nothing running (apart from angsd which I just started for another project). This time i tried using the pairwise flag.
Hi, it seems that the wgd
program was not occupying the memory properly in your system. Did you add such command in your submitting script to set the memory for each thread? Like below I distributed 1 core/thread and 20G for the job.
#!/bin/bash
#$ -pe serial 1
#$ -l h_vmem=20G
Hi- I included that in the script and tried to run it again, but it still seems stuck in the same place and is not actually running when you look at htop
It could be indeed a problem. Could you please try to use only 1 thread and distribute for instance 40Gb memory for that thread to see if it got stuck again?
Hi I am running only the one core now, with memory set to 40G by including #$ -l h_vmem=40G in the script. It does not look like it is using the 40G.
The memory allocation seems incorrect - saying there are only 6G? Perhaps this is the issue?
Hi, Thanks for developing wgd2 an important tool for tracing down the ancient polyploidization events. I also ran into same problem like @EmilyPhelps wgd ksd running endless without any new files being generated in the temp folder or error been thrown onto the terminal.
Hi, a quick test, could you please download the small size test file https://github.com/heche-psb/wgd/blob/phylodating/test/data/egu1000.fasta and do wgd dmd egu1000.fasta; wgd ksd wgd_dmd/egu1000.fasta.tsv egu1000.fasta
and see if you can make it run successfully?
Hi- Just ran the test and it completed without issue!
Hi, Same the test ran successfully.
The one which u suggest wgd dmd egu1000.fasta; wgd ksd wgd_dmd/egu1000.fasta.tsv egu1000.fasta
is a whole paranome approach for ksd. What I am following is through orthologous approach first then ksd wgd dmd cds1.fasta cds2.fasta cds3.fasta --globalmrbh
followed by wgd ksd global_MRBH.tsv *cds.fasta --pairwise -o wgd_globalmrbh_ks -n 36
.
Regards Praveen
Hi, I think the problem is with the datafile you used, perhaps too large. Could you also randomly select only one thousand sequences of your own datafile to repeat what you have done with egu1000.fasta
and see whether it got stuck?
Hi, I successfully ran the tool as per your suggestion on individual species file on full data (No random sequence selection was done). Also I was able to visualize the wgd events in my species attaching the results. Following are the commands I used.
Four species were used for the analysis
Please Let me know how to troubleshoot the problem we are facing in analyzing in orthologous mode. As I wanted to perform that and get the final output with WGD dating and finally placed in a tree.
-Regards Praveen
Hi, it's nice to see the two prominent WGD peaks given by the ELMM result. To achieve rate correction, we need to first infer the globalMRBH (or less preferred localMRBH) families, based on which we infer orthologous Ks values and subsequently perform the rate correction. The command to realize such job can be set as below. May I know a bit more details of which step didn't work in your dataset and could you share the error log please, which will help us debug further.
wgd dmd --globalmrbh *allspecies -o wgd_globalmrbh
wgd ksd wgd_globalmrbh/global_MRBH.tsv *allspecies -o wgd_globalmrbh_ks
wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv -fa focal_species -epk wgd_ksd/focal_species.tsv.ks.tsv -ap wgd_syn/iadhore-out/anchorpoints.txt -sp speciestree.nw -o wgd_viz_mixed_Ks --plotelmm --plotapgmm --reweight
2024-07-24 16:36:17 INFO Analysing family GF00000059 core.py:3064 2024-07-24 16:36:17 INFO Analysing family GF00000060 core.py:3064
I'm encountering the same issue as Emily with the 'ksd' process. Despite allocating 200G of memory and 60 threads, it is still taking an excessive amount of time. I'm processing a whole genome, and after 'dmd', the total number of gene families is 13,166.
Hi, I guess it's a memory-insufficient issue caused by some ultra-large families (size>200). A quick validation, could you remove the first 200 large paralogous families and conduct the ksd
command again on the reduced gene family file to see if it got stuck? Besides, are you using the --pairwise
mode which entails much more calculations per family and what is the distribution of your gene family sizes?
2024-07-24 16:36:17 INFO Analysing family GF00000059 core.py:3064 2024-07-24 16:36:17 INFO Analysing family GF00000060 core.py:3064
I'm encountering the same issue as Emily with the 'ksd' process. Despite allocating 200G of memory and 60 threads, it is still taking an excessive amount of time. I'm processing a whole genome, and after 'dmd', the total number of gene families is 13,166.
Hello,
I removed the first 200 paralogous families and ran the ksd analysis again. However, the process now gets stuck at family 618. I observed that the process starts off quickly but then gradually slows down. Could this be a memory issue?
Before removing the 200 families, the maximum size of a gene family was 126 genes, and there were a total of 70,849 rows. After the removal, the maximum size of a gene family is now 7 genes, and there are 70,649 rows remaining. This dataset consists only of CDS sequences from a molluscan species, following the ab initio annotation of its genome.
Let me know if you have any suggestions or if there's anything else I should check.
Hi, a quick question, is family 618 a singleton family? I notice that sometimes the ksd
run stops when it goes through singleton family.
No, family 618 is not a singleton; it contains three genes. Singleton families are skipped at the start of the ksd analysis. Out of the 70,649 gene families, I'm left with 2,548 gene families for the analysis. Here’s a snippet from my log for your reference:
INFO Skipping singleton family GF00070846['cds.faa_46294'] INFO Skipping singleton family GF00070847['cds.faa_00464'] INFO Skipping singleton family GF00070848['cds.faa_27309'] INFO 60 threads are used for 2,548 gene families Note that adding threads can significantly accelerate the Ks estimation process 2024-07-25 15:32:08 INFO Analysing family GF00000201 2024-07-25 15:32:08 INFO Analysing family GF00000202 2024-07-25 15:32:08 INFO Analysing family GF00000203
Hi, the occasion that I mention ksd
stops at singleton family is due to memory overflow when concatenating all the individual .csv
files in the end, which seems irrelevant in your case. Another check about the family 618, could you cd
into the tmp
folder for this family and see whether the .csv
file has been produced already?
I was reviewing the temporary files and found that GF00000617 has been executed successfully. However, GF00000618 appears to be stuck at a specific step. Below is the current content of the directory for your reference:
GF00000617 total 36K -rw-rw-r-- 1 workspace2 workspace2 1.6K Jul 25 16:06 pro.fasta -rw-rw-r-- 1 workspace2 workspace2 2.3K Jul 25 16:06 pro.aln -rw-rw-r-- 1 workspace2 workspace2 399 Jul 25 16:06 GF00000617.ctrl -rw-rw-r-- 1 workspace2 workspace2 6.5K Jul 25 16:06 GF00000617.cdsaln -rw-rw-r-- 1 workspace2 workspace2 5.6K Jul 25 16:06 GF00000617.codeml -rw-rw-r-- 1 workspace2 workspace2 81 Jul 25 16:06 pro.aln.nw -rw-rw-r-- 1 workspace2 workspace2 698 Jul 25 16:06 GF00000617_ks.csv
GF00000618 total 20K -rw-rw-r-- 1 workspace2 workspace2 1.7K Jul 25 16:06 pro.fasta -rw-rw-r-- 1 workspace2 workspace2 2.3K Jul 25 16:06 pro.aln -rw-rw-r-- 1 workspace2 workspace2 399 Jul 25 16:06 GF00000618.ctrl -rw-rw-r-- 1 workspace2 workspace2 6.7K Jul 25 16:06 GF00000618.cdsaln -rw-rw-r-- 1 workspace2 workspace2 0 Jul 25 16:06 rub -rw-rw-r-- 1 workspace2 workspace2 0 Jul 25 16:06 rst1 -rw-rw-r-- 1 workspace2 workspace2 0 Jul 25 16:06 rst -rw-rw-r-- 1 workspace2 workspace2 0 Jul 25 16:06 GF00000618.codeml -rw-rw-r-- 1 workspace2 workspace2 0 Jul 25 16:06 2NG.t -rw-rw-r-- 1 workspace2 workspace2 0 Jul 25 16:06 2NG.dS -rw-rw-r-- 1 workspace2 workspace2 0 Jul 25 16:06 2NG.dN -rw-rw-r-- 1 workspace2 workspace2 0 Jul 25 16:06 2ML.t -rw-rw-r-- 1 workspace2 workspace2 0 Jul 25 16:06 2ML.dS -rw-rw-r-- 1 workspace2 workspace2 0 Jul 25 16:06 2ML.dN
Hi, could you try to conduct codeml GF00000618.ctrl
and see what you get?
Yeah, I tried codeml GF00000618.ctrl
and got the following results:
total 76K lrwxrwxrwx 1 workspace2 workspace2 29 May 29 2020 codeml.ctl -> /usr/lib/paml/data/codeml.ctl -rw-rw-r-- 1 workspace2 workspace2 1.7K Jul 25 16:06 pro.fasta -rw-rw-r-- 1 workspace2 workspace2 2.3K Jul 25 16:06 pro.aln -rw-rw-r-- 1 workspace2 workspace2 399 Jul 25 16:06 GF00000618.ctrl -rw-rw-r-- 1 workspace2 workspace2 6.7K Jul 25 16:06 GF00000618.cdsaln -rw-rw-r-- 1 workspace2 workspace2 5.9K Jul 25 16:52 mlc -rw-rw-r-- 1 workspace2 workspace2 6.5K Jul 25 16:52 lnf -rw-rw-r-- 1 workspace2 workspace2 0 Jul 25 16:53 rub -rw-rw-r-- 1 workspace2 workspace2 161 Jul 25 16:53 rst1 -rw-rw-r-- 1 workspace2 workspace2 444 Jul 25 16:53 rst -rw-rw-r-- 1 workspace2 workspace2 5.6K Jul 25 16:53 GF00000618.codeml -rw-rw-r-- 1 workspace2 workspace2 97 Jul 25 16:53 2NG.t -rw-rw-r-- 1 workspace2 workspace2 97 Jul 25 16:53 2NG.dS -rw-rw-r-- 1 workspace2 workspace2 97 Jul 25 16:53 2NG.dN -rw-rw-r-- 1 workspace2 workspace2 82 Jul 25 16:53 2ML.t -rw-rw-r-- 1 workspace2 workspace2 82 Jul 25 16:53 2ML.dS -rw-rw-r-- 1 workspace2 workspace2 82 Jul 25 16:53 2ML.dN
If codeml GF00000618.ctrl
works fine, which indicates no issue with this specific gene family, I would attribute the unexpected stopping of ksd
still to the memory issue for some overflowed threads. I would suggest reducing the number of threads and increase the memory per thread, for instance 20 threads and per thread 10Gb.
Thank you, I'll give it a try and keep you updated.
Yes, I believe it is due to a memory issue. I've reduced the number of threads and increased the memory per thread. However, it now gets stuck at gene family 1056. Are there any other parameters that might help reduce the memory usage per thread?
Hi, the default parameter of ksd
should cost the minimum memory already. Normally I would assign 4-8 threads and 15-20Gb per thread. You may also try the same. Let me know if it still got stuck. Besides, are you using paml v4.9j
?
Yeah, I'm using paml v4.9j. I'll keep you updated.
I’m encountering an issue with running the wgd syn command. I processed a genome file using Augustus for annotation and extracted CDS sequences using grep -v "#" | grep "CDS"
followed by extracting the sequences, resulting in cds.faa. Subsequently, I ran the following commands:
wgd dmd cds.faa -e 1e-50 -I 2.0 -c 0.9 --orthoinfer -o wgd_dmd -n 60
wgd ksd wgd_dmd/cds.faa.tsv cds.faa --cds --strip_gaps -o wgd_ksd -n 8
and got the cds.faa.tsv.ks.tsv
file. However, now I'm running
wgd syn -f CDS -a ID wgd_dmd/cds.faa.tsv cds.gff -ks wgd_ksd/cds.faa.tsv.ks.tsv -o wgd_syn -n 40
but is facing an issue:
ERROR No genes from families file `wgd_dmd/cds.faa.tsv` found in the GFF file for `feature=CDS` and `attribute=ID`, please double check command settings. cli.py:696
This is my cds.gff
, cds.faa.tsv.ks.tsv
and cds.faa.tsv
file:
file.zip
Could you please help identify the potential cause of this error and suggest any steps to resolve it?
Thank you.
Hi, gene ids at the "CDS" rows after "ID=" should be the same as the gene ids in your cds.faa.tsv
file, which is apparently not the case in your dataset. In you gene family file cds.faa.tsv
gene ids are like "WURV02015968.1:2079-3951" while in your gff3 file gene ids are like "g53714.t1.cds". You need to synchronize the gene ids in all your datafiles.
Thank you for your help. I have successfully modified the ID and gene name in the GFF file, and it now looks like:
WURV02000002.1:6299-7889 AUGUSTUS CDS 6300 7889 0.19 - 0 ID=WURV02000002.1:6299-7889;Parent=g1.t1 WURV02000002.1:8334-8365 AUGUSTUS CDS 8335 8365 0.66 + 0 ID=WURV02000002.1:8334-8365;Parent=g2.t1 WURV02000002.1:8489-8929 AUGUSTUS CDS 8490 8929 0.22 + 2 ID=WURV02000002.1:8489-8929;Parent=g2.t1 WURV02000002.1:8968-9208 AUGUSTUS CDS 8969 9208 1 + 0 ID=WURV02000002.1:8968-9208;Parent=g2.t1 WURV02000002.1:52499-52774 AUGUSTUS CDS 52500 52774 0.77 - 2 ID=WURV02000002.1:52499-52774;Parent=g3.t1 WURV02000002.1:56763-56785 AUGUSTUS CDS 56764 56785 0.24 - 0 ID=WURV02000002.1:56763-56785;Parent=g3.t1 WURV02000003.1:11572-11785 AUGUSTUS CDS 11573 11785 1 + 0 ID=WURV02000003.1:11572-11785;Parent=g4.t1 WURV02000003.1:11806-12016 AUGUSTUS CDS 11807 12016 0.38 + 0 ID=WURV02000003.1:11806-12016;Parent=g5.t1 WURV02000006.1:19188-19659 AUGUSTUS CDS 19189 19659 0.18 + 0 ID=WURV02000006.1:19188-19659;Parent=g6.t1 WURV02000006.1:37479-38097 AUGUSTUS CDS 37480 38097 0.59 - 0 ID=WURV02000006.1:37479-38097;Parent=g7.t1
I also changed the gene name and added the coordinates to the name due to a duplicate gene error.
However, I’m encountering an issue after downloading, compiling, and successfully testing the i-adhore
tool.
INFO Processing I-ADHoRe output cli.py:711
Traceback (most recent call last):
File "/home/workspace2/anaconda3/envs/wgd/bin/wgd", line 10, in
What could be the potential reason?
Hi, a quick question. Could you show me the full log of your command and the content of your wgd_syn
folder?
Hello,
I'm attaching my cds.faa.tsv
, cds.clean.gff
, cds.faa.tsv.ks.tsv
files, along with the log.txt
file and the wgd_syn
directory, for your reference. Thank you for your patience. I greatly appreciate your assistance. NOTE: In the gff3 file, I changed the ID
and the gene name in the first column (due to a duplicate gene name error). I believe it has something to do with the format of my files. Please let me know if you need any more files for your reference.
Hi, the first column of cds.clean.gff
should be the chromosome or scaffold IDs, rather than gene ids.
Hello, thank you for your clarification. I'll work on it and get the proper format.
I changed my gff
file into this particular format
WURV02000002.1 AUGUSTUS CDS 6300 7889 0.19 - 0 ID=WURV02000002.1:6299-7889;Parent=g1.t1 WURV02000002.1 AUGUSTUS CDS 8335 8365 0.66 + 0 ID=WURV02000002.1:8334-8365;Parent=g2.t1 WURV02000002.1 AUGUSTUS CDS 8490 8929 0.22 + 2 ID=WURV02000002.1:8489-8929;Parent=g2.t1 WURV02000002.1 AUGUSTUS CDS 8969 9208 1 + 0 ID=WURV02000002.1:8968-9208;Parent=g2.t1 WURV02000002.1 AUGUSTUS CDS 52500 52774 0.77 - 2 ID=WURV02000002.1:52499-52774;Parent=g3.t1 WURV02000002.1 AUGUSTUS CDS 56764 56785 0.24 - 0 ID=WURV02000002.1:56763-56785;Parent=g3.t1 WURV02000003.1 AUGUSTUS CDS 11573 11785 1 + 0 ID=WURV02000003.1:11572-11785;Parent=g4.t1 WURV02000003.1 AUGUSTUS CDS 11807 12016 0.38 + 0 ID=WURV02000003.1:11806-12016;Parent=g5.t1 WURV02000006.1 AUGUSTUS CDS 19189 19659 0.18 + 0 ID=WURV02000006.1:19188-19659;Parent=g6.t1 WURV02000006.1 AUGUSTUS CDS 37480 38097 0.59 - 0 ID=WURV02000006.1:37479-38097;Parent=g7.t1
Yet I'm getting the following error:
2024-07-26 15:29:02 INFO This is wgd v2.0.38 cli.py:34 INFO Checking cores and threads... core.py:35 INFO The number of logical CPUs/Hyper Threading in the system: 80 core.py:36 INFO The number of physical cores in the system: 20 core.py:37 INFO The number of actually usable CPUs in the system: 80 core.py:38 INFO Checking memory... core.py:40 INFO Total physical memory: 251.5079 GB core.py:41 INFO Available memory: 215.7972 GB core.py:42 INFO Free memory: 52.1904 GB core.py:43 2024-07-26 15:29:03 INFO Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. utils.py:144 INFO Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. utils.py:147 INFO NumExpr defaulting to 8 threads. utils.py:159 INFO Configuring I-ADHoRe co-linearity search cli.py:703 INFO Writing families file syn.py:96 INFO Writing gene lists syn.py:98 2024-07-26 15:29:09 ERROR There are duplicated gene IDs for given feature and attribute syn.py:121
Does that mean I have duplicates in the 9th column of my gff file? Which duplicates should I remove?
Hi, each row with CDS and ID tag should have unique gene ids. Can you make sure it's the case in your data?
Hello, I'm still trying to locate the duplicate gene IDs in my cds.clean.gff
file. Does the duplication lie in the 9th column? I've tested multiple times but still can't find any duplicate rows. I even converted the gene IDs and replaced ":" and "-" to convert the IDs into the format WURV02000018.1_22566_22857
in both the gff and fasta file. Could you please suggest any workarounds for this problem? The cds.clean.gff
file is attached to the zip file above.
Hi, another quick notation. You should not use all the raw CDS per gene for collinearity search, but use one main transcript per gene, same for constructing whole paranome and Ks distribution.
Hi, thank you for the quick solution. I'll try it and keep you updated.
The direct reason that caused the error log 2024-07-26 15:29:09 ERROR There are duplicated gene IDs for given feature and attribute syn.py:121
is that you have duplicated gene ids in your gene family file, for instance "WURV02226005.1:440-1065", which indicates something wrong with your gene id renaming process.
I understand, I was using bedtools getfasta
to fetch the CDS from the genome. Maybe I should switch to gffread
to find a solution. Thank you for your suggestion.
Hi, I used gffread
to solve the problem. However, I'm facing yet another issue. Could you please look into it?
What does remove_unused_levels
mean?
My gff3 file is in this format:
If you're using short CDS for collinearity search, you may set the option --mingenenum
as 1 and --minseglen
as 1 (please refer to README
for the explanation of these parameters).
Thanks! I'll go through the manual carefully.
Hello, I hope you're doing well. I've managed to run the entire pipeline. However, I noticed that a few of the plots are missing from the directories, like the dupstackplot
and dotplot
. These are the commands which I ran:
wgd dmd cds.faa --eval 1e-5 --inflation 1.5 -nthreads 50 --keepfasta --cscore 0.7 --orthoinfer
wgd ksd wgd_dmd/cds.faa.tsv cds.faa --cds --strip_gaps --outdir wgd_ksd -nthreads 8
wgd syn --feature CDS --attribute ID wgd_dmd/cds.faa.tsv cds.clean.gff --ks_distribution wgd_ksd/cds.faa.tsv.ks.tsv --outdir wgd_syn --nthreads 50 --gistrb --showrealtick --minseglen 1 --mingenenum 1 --maxsize 10
wgd viz --datafile wgd_ksd/cds.faa.tsv.ks.tsv --outdir wgd_ELMM --plotkde --plotsyn --segments wgd_syn/iadhore-out/segments.txt --maxsize 20 --anchorpoints wgd_syn/iadhore-out/anchorpoints.txt --minseglen 1 --mingenenum 1 --plotapgmm --plotelmm --showrealtick --adjustortho --classic --toparrow --gistrb --genetable wgd_syn/gene-table.csv --nthreads 50 --multiplicon wgd_syn/iadhore-out/multiplicons.txt
This is the content of my wgd_syn
directory:
total 12M drwxrwxr-x 2 workspace2 workspace2 80K Jul 29 14:02 cds.faa.lists drwxrwxr-x 2 workspace2 workspace2 226 Jul 29 14:03 iadhore-out -rw-rw-r-- 1 workspace2 workspace2 1.1M Jul 29 16:36 families.tsv -rw-rw-r-- 1 workspace2 workspace2 226K Jul 29 16:36 iadhore.conf -rw-rw-r-- 1 workspace2 workspace2 7.6M Jul 29 16:38 cds.faa_gene_order_perchrom.tsv -rw-rw-r-- 1 workspace2 workspace2 2.6M Jul 29 16:38 gene-table.csv -rw-rw-r-- 1 workspace2 workspace2 6.6K Jul 29 16:38 anchors.csv -rw-rw-r-- 1 workspace2 workspace2 85 Jul 29 16:38 segments_coordinates.tsv
and this is the content of the iadhore-out
directory:
total 828K -rw-rw-r-- 1 workspace2 workspace2 0 Jul 29 16:38 alignment.txt -rw-rw-r-- 1 workspace2 workspace2 763K Jul 29 16:38 genes.txt -rw-rw-r-- 1 workspace2 workspace2 5.4K Jul 29 16:38 multiplicons.txt -rw-rw-r-- 1 workspace2 workspace2 1.9K Jul 29 16:38 baseclusters.txt -rw-rw-r-- 1 workspace2 workspace2 8.1K Jul 29 16:38 segments.txt -rw-rw-r-- 1 workspace2 workspace2 8.5K Jul 29 16:38 multiplicon_pairs.txt -rw-rw-r-- 1 workspace2 workspace2 16K Jul 29 16:38 list_elements.txt -rw-rw-r-- 1 workspace2 workspace2 11K Jul 29 16:38 anchorpoints.txt
Am I missing any parameter? How can I get the dupstack plot
and the dotplot
? Please let me know and thank you for your support.
Hi, I suggest the parameter "--maxsize 10" stick to the default "--maxsize 200" because "--maxsize 10" means you only consider gene families whose size is smaller than or equal to 10.
Thank you for your suggestions. All of the pipelines are now running smoothly after several rounds of iterations and trials. So, my basic project includes identifying the presence of WGD events in mollusca. These are the Ks plots I obtained.
As per my understanding:
In case of my data:
Are my inferences correct? Please let me know.
Thank you for your patience!
Hi, I suggest conducting ELMM analysis using wgd viz
to discern prominent Ks peaks by WGDs before you delve any deeper. Besides, how does your anchor Ks distribution look like?
Hello, I've conducted the ELMM analysis using wgd viz
and examined the prominent Ks peaks and the anchor Ks distribution.
According to my inferences:
The analysis identified a distinct peak in the log-transformed Ks distribution. The model selection process was guided by minimizing the BIC.
I've attached both the wgd_ELMM
and the wgd_peak
files for your reference.
Do you have any recommendations on further refining the model or additional analyses to perform? Are these and my inference correct?
Hello,
I've been trying to use wgd V2 to calculate a Ks peak
I've taken a subset of my CDS (~3K sequences)
When I run
I get: But it finishes. So if I run:
It will start off okay and just slow to a stop. I also get this error, for pretty much each gene family: