heche-psb / wgd

wgd v2: a suite of tools to uncover and date ancient polyploidy and whole-genome duplication
https://wgdv2.readthedocs.io/en/latest/
GNU General Public License v3.0
21 stars 0 forks source link

--globalmrbh #38

Closed ilaydagulmez closed 3 weeks ago

ilaydagulmez commented 4 weeks ago

Hi, I tried for my 5 individual species for the wgd dmd --globalmrbh. I got the error like this:

Screenshot 2024-06-05 at 14 58 09

My commands are like this:

wgd dmd --globalmrbh20628_8.cds.fasta 20628_9.cds.fasta 20628_1.cds.fasta 20896_1.cds.fasta 20896_2.cds.fasta -o global

Any suggestion for this error? Thanks for your time.

heche-psb commented 4 weeks ago

Hi, it seems something unusual in your sequence file 20628_9.cds.fasta. Could you have a check at that temporary directory for the result of diamond?

ilaydagulmez commented 4 weeks ago

Did you mean should I diamond for each fasta file separately before run dmd ?

heche-psb commented 4 weeks ago

Hi, I mean could you check that temporary directory for 20628_9.cds.fasta, given its error message. There might be something unusual there.

ilaydagulmez commented 4 weeks ago

Every tmp directory has the same files for each fasta file.

Screenshot 2024-06-05 at 16 38 25
heche-psb commented 4 weeks ago

The fact that your run didn't call diamond properly is probably due to the same cause as your last issue with i-adhore. Could you run wgd dmd 20628_9.cds.fasta successfully?

ilaydagulmez commented 4 weeks ago

Yes, wgd dmd 20628_9.cds.fasta was done before.

Screenshot 2024-06-06 at 08 59 55 Screenshot 2024-06-06 at 09 00 20
heche-psb commented 4 weeks ago

Hi, could you try your original command calling --globalmrbh but without 20628_9.cds.fasta?

ilaydagulmez commented 4 weeks ago

The same error occurred with another fasta file, even though it had previously worked fine on its own.

Screenshot 2024-06-06 at 11 38 12
heche-psb commented 4 weeks ago

I see. May you share me with your data? I will try to reproduce your error.

ilaydagulmez commented 4 weeks ago

Thanks for your kindness and help. Here are my five individual cds data. And another not solved question, wgd syn succeeded before this pipeline but still did not work with wgd peak. Again, without syn output files, peak does work.

https://transfer.adttemp.com.br/rGm9q/transfersh-58772.zip

Ks_gmm_3components_prediction tsv_Ks

heche-psb commented 3 weeks ago

Hi, yes, wgd peak works with whole paranome too. I guess your failed run with globalmrbh is due to the big file size of genomes being used. A quick suggestion, could you increase the memory for the globalmrbh job and run again?

ilaydagulmez commented 3 weeks ago

Hi, should I increase the memory by adding a parameter to dmd, I asked this cause I didn't see such a parameter option.

heche-psb commented 3 weeks ago

Hi, it should be set when you submit your job to the calculation node of your HPC in your job script. wgd doesn't have options to set the memory for jobs.

ilaydagulmez commented 3 weeks ago

Hi, I tried with high core and got the same error.

heche-psb commented 3 weeks ago

How many memory did you give?

ilaydagulmez commented 3 weeks ago

The partition has 28 cores and 128 GB memory. I intended to run the globalmrbh because the wgd peak couldn't be generated with the syn output. Perhaps there's no need to run a global analysis after all if the peak problem could solved. (https://github.com/heche-psb/wgd/issues/37)

heche-psb commented 3 weeks ago

This is what I got using your data, using 20Gb memory and 1 thread.

2024-06-08 20:44:15 INFO     This is wgd v2.0.37                       cli.py:34
                    INFO     Checking cores and threads...            core.py:35
                    INFO     The number of logical CPUs/Hyper         core.py:36
                             Threading in the system: 24
                    INFO     The number of physical cores in the      core.py:37
                             system: 6
                    INFO     The number of actually usable CPUs in    core.py:38
                             the system: 2
                    INFO     Checking memory...                       core.py:40
                    INFO     Total physical memory: 251.3222 GB       core.py:41
                    INFO     Available memory: 221.4534 GB            core.py:42
                    INFO     Free memory: 40.6646 GB                  core.py:43
2024-06-08 20:52:52 INFO     tmpdir =                                 cli.py:125
                             wgdtmp_0e5b0d7b-a558-4f9c-9f65-177b6da58
                             07f for 20628-1.cds.fasta
                    INFO     tmpdir =                                 cli.py:125
                             wgdtmp_92941af3-a3ca-485d-b9a5-03e0c2c4a
                             e11 for 20628-8.cds.fasta
                    INFO     tmpdir =                                 cli.py:125
                             wgdtmp_cd78a9ae-b303-4d37-9da8-b2be38bec
                             235 for 20628-9.cds.fasta
                    INFO     tmpdir =                                 cli.py:125
                             wgdtmp_707bf0ec-6937-42b9-9afa-2de3139d6
                             2b9 for 20896-1.cds.fasta
                    INFO     tmpdir =                                 cli.py:125
                             wgdtmp_950f4072-9cd2-4465-84cd-34e558e1d
                             787 for 20896-2.cds.fasta
                    INFO     Multiple cds files: will compute        core.py:875
                             globalMRBH orthologs or cscore-defined
                             homologs regardless of focal species
                    INFO     Note that setting the number of threads core.py:879
                             as 10 is the most efficient
                    INFO     20628-1.cds.fasta vs. 20628-8.cds.fasta core.py:848
2024-06-08 21:14:27 INFO     Normalization between 20628-1.cds.fasta core.py:406
                             & 20628-8.cds.fasta
                    INFO     100 bins & upper 5% hits in linear      core.py:407
                             regression
2024-06-08 21:37:38 INFO     20628-1.cds.fasta vs. 20628-9.cds.fasta core.py:848
2024-06-08 21:56:40 INFO     Normalization between 20628-1.cds.fasta core.py:406
                             & 20628-9.cds.fasta
                    INFO     100 bins & upper 5% hits in linear      core.py:407
                             regression
2024-06-08 22:19:20 INFO     20628-1.cds.fasta vs. 20896-1.cds.fasta core.py:848
2024-06-08 22:49:05 INFO     Normalization between 20628-1.cds.fasta core.py:406
                             & 20896-1.cds.fasta
                    INFO     100 bins & upper 5% hits in linear      core.py:407
                             regression
2024-06-08 23:41:54 INFO     20628-1.cds.fasta vs. 20896-2.cds.fasta core.py:848
2024-06-09 00:08:09 INFO     Normalization between 20628-1.cds.fasta core.py:406
                             & 20896-2.cds.fasta
                    INFO     100 bins & upper 5% hits in linear      core.py:407
                             regression
2024-06-09 00:37:19 INFO     20628-8.cds.fasta vs. 20628-9.cds.fasta core.py:848
2024-06-09 00:55:26 INFO     Normalization between 20628-8.cds.fasta core.py:406
                             & 20628-9.cds.fasta
                    INFO     100 bins & upper 5% hits in linear      core.py:407
                             regression
2024-06-09 01:17:44 INFO     20628-8.cds.fasta vs. 20896-1.cds.fasta core.py:848
2024-06-09 01:47:32 INFO     Normalization between 20628-8.cds.fasta core.py:406
                             & 20896-1.cds.fasta
                    INFO     100 bins & upper 5% hits in linear      core.py:407
                             regression
2024-06-09 02:43:37 INFO     20628-8.cds.fasta vs. 20896-2.cds.fasta core.py:848
2024-06-09 03:10:32 INFO     Normalization between 20628-8.cds.fasta core.py:406
                             & 20896-2.cds.fasta
                    INFO     100 bins & upper 5% hits in linear      core.py:407
                             regression
2024-06-09 03:42:17 INFO     20628-9.cds.fasta vs. 20896-1.cds.fasta core.py:848
2024-06-09 04:09:34 INFO     Normalization between 20628-9.cds.fasta core.py:406
                             & 20896-1.cds.fasta
                    INFO     100 bins & upper 5% hits in linear      core.py:407
                             regression
2024-06-09 05:03:57 INFO     20628-9.cds.fasta vs. 20896-2.cds.fasta core.py:848
2024-06-09 05:28:34 INFO     Normalization between 20628-9.cds.fasta core.py:406
                             & 20896-2.cds.fasta
                    INFO     100 bins & upper 5% hits in linear      core.py:407
                             regression
2024-06-09 05:58:17 INFO     20896-1.cds.fasta vs. 20896-2.cds.fasta core.py:848
2024-06-09 06:32:42 INFO     Normalization between 20896-1.cds.fasta core.py:406
                             & 20896-2.cds.fasta
                    INFO     100 bins & upper 5% hits in linear      core.py:407
                             regression
2024-06-09 07:05:10 INFO     Total run time: 620.91 minutes         core.py:1637
                    INFO     Done                                   core.py:1638

The command is:

wgd dmd --globalmrbh -o wgd_globalmrbh_2 20628-1.cds.fasta 20628-8.cds.fasta 20628-9.cds.fasta 20896-1.cds.fasta 20896-2.cds.fasta -n 1

ilaydagulmez commented 3 weeks ago

Hi, thanks for your help. I tried to get the more effective gene prediction and CDS file and run dmd from the beginning for just one file. But got the error:

Screenshot 2024-06-10 at 09 55 34

File: helixer.fasta.txt

Thanks.

ilaydagulmez commented 3 weeks ago

Okey I got it, it's happens when header are the same. Solved!