heche-psb / wgd

wgd v2: a suite of tools to uncover and date ancient polyploidy and whole-genome duplication
https://wgdv2.readthedocs.io/en/latest/
GNU General Public License v3.0
21 stars 0 forks source link

syn -gff3 file #35

Closed ilaydagulmez closed 1 month ago

ilaydagulmez commented 1 month ago

Hi, I wrote about another step. I successfully finished others so thank you for your help!

Little question about step syn. I have a gff3 file from the Augustus , but when I run the command:

wgd syn /wgd_dmd/sample.cds.fasta.tsv /Augustus/sample.gff

gff3 file looks like:

Ekran Resmi 2024-05-28 12 56 50

I got the error like this:

Ekran Resmi 2024-05-28 12 54 05

Actually it's the same cds.fasta input for gff and wgd I didn't understand, so do you have any suggestions for getting the right gff3 file?

Many thanks for your time and help!

İlayda

heche-psb commented 1 month ago

Just a quick check, is there a gene id in your sample.cds.fasta.tsv exactly "g1"? And did you generate the sample.cds.fasta.tsv using wgd v2 too? Can you make sure that each line (except for the ones starting with "#") in your gff3 file is separated by tab and has in total 9 columns?

ilaydagulmez commented 1 month ago

Yes, I generatedsample.cds.fasta.tsvusing wgd v2 and my gff3 file has 9 columns as I shared. The file which is generated with the command dmd doesn't start "g1", here is the file:

20628_8.cds.fasta.tsv.txt

But as I wrote, those inputs are the same. I don't understand why there is a problem like this.

Thanks for your time

heche-psb commented 1 month ago

Hi, the gene ids in your family file need to be the same as presented in your gff3 file, which is apparently not the case in your dataset. Your gene ids are like "g46456.t1", "g23037.t1", instead of "g1". You have to make the gff3 file contain the same ids. Otherwise the program can't discern which gene is at which position.

ilaydagulmez commented 1 month ago

Yes I know and understand but already the cds file which is the input for dmd was generated from Augustus so I mean it's the same input file.

Thanks

heche-psb commented 1 month ago

I don't get it. You mean that the cds file and gff3 file were both generated from Augustus but had different gene ids?

ilaydagulmez commented 1 month ago

Yes cds file and the gff3 file were both generated from Augustus even though the dmd output was as I shared.

heche-psb commented 1 month ago

I see. Could you please share me with the correct family file and gff3 file? I will try to reproduce your error.

ilaydagulmez commented 1 month ago

Okey, thank you. Here is my gff3 file:

https://transfer.adttemp.com.br/EDzFs/20628-8.gff

my family file from dmd: 20628_8.cds.fasta.tsv.txt

Thanks for your help!

heche-psb commented 1 month ago

Hi, you just need to add the option -f transcript and -a ID so as to make it run successfully. Could you please try again?

ilaydagulmez commented 1 month ago

What a great news! Thanks for your help, I will try and (hope) close the issue! 😊

ilaydagulmez commented 1 month ago

Hi again, am I install i-adhore separately? I think this parameter (-f transcript and -a ID) will run but now I got the FileNotFoundError: [Errno 2] No such file or directory: 'i-adhore' error. Thanks!

heche-psb commented 1 month ago

Hi, you need to install i-adhore before hand, for which you could refer to https://github.com/VIB-PSB/i-ADHoRe.

ilaydagulmez commented 1 month ago

May I add to the path of the i-adhore way as a parameter? I installed it but still had the same error. When I added the path, got theIndexError: list index out of range.

Thanks

heche-psb commented 1 month ago

If you type in i-adhore and enter, can you get the informaion "Usage: i-adhore [configuration file]" as a return? That means you have successfully installed the software.

ilaydagulmez commented 1 month ago

Yes I get this.

Ekran Resmi 2024-05-29 13 51 24
heche-psb commented 1 month ago

This is what I got using your data. You just need to make sure i-adhore v3 is in your environment path and can be properly called.

(ENV_wgd) (base) heche@HengchiChen$ wgd syn -f transcript -a ID 20628_8.cds.fasta.tsv.txt 20628-8.gff -o debug_syn
2024-05-29 14:47:29 INFO     This is wgd v2.0.38                                                                                       cli.py:34
                    INFO     Checking cores and threads...                                                                            core.py:35
                    INFO     The number of logical CPUs/Hyper Threading in the system: 8                                              core.py:36
                    INFO     The number of physical cores in the system: 4                                                            core.py:37
                    INFO     The number of actually usable CPUs in the system: 8                                                      core.py:38
                    INFO     Checking memory...                                                                                       core.py:40
                    INFO     Total physical memory: 7.6480 GB                                                                         core.py:41
                    INFO     Available memory: 1.1672 GB                                                                              core.py:42
                    INFO     Free memory: 0.9874 GB                                                                                   core.py:43
2024-05-29 14:47:32 INFO     Configuring I-ADHoRe co-linearity search                                                                 cli.py:703
                    INFO     Writing families file                                                                                     syn.py:96
                    INFO     Writing gene lists                                                                                        syn.py:98
2024-05-29 14:47:39 INFO     Writing config file                                                                                      syn.py:100
2024-05-29 14:49:08 INFO     Running I-ADHoRe                                                                                         cli.py:707
2024-05-29 14:49:10 WARNING  WARNING: Maximum allowed number of gaps in the alignment not specified.  Setting to cluster_gap.         syn.py:188
                             WARNING: Tandem gap size not correct in settings file. Using default (gap_size / 2)

                    INFO                                                                                                              syn.py:189
                             This is i-ADHoRe v3.0.
                             Copyright (c) 2002-2010, Flanders Interuniversity Institute for Biotechnology, VIB.
                             Algorithm designed by Klaas Vandepoele, Cedric Simillion, Jan Fostier, Dieter De Witte,
                             Koen Janssens, Sebastian Proost, Yvan Saeys and Yves Van de Peer.

                             Process 1/1 is alive on HengchiChen.

                             ************* i-ADHoRe parameters *************
                                     Number of genelists = 2823
                                     Blast table =
                             /mnt/c/Users/hengc/wgdating_package/ud_wgd/wgd_debug/debug_0524/debug_syn/families.tsv
                                     Output path =
                             /mnt/c/Users/hengc/wgdating_package/ud_wgd/wgd_debug/debug_0524/debug_syn/iadhore-out/
                                     Gap size = 30
                                     Cluster gap size = 35
                                     Cloud gap size = 0
                                     Cloud cluster gap size = 0
                                     Max gaps in alignment = 35
                                     Tandem gap = 15
                                     Flush output = 1000
                                     Q-value = 0.75
                                     Anchorpoints = 3
                                     Probability cutoff = 0.01
                                     Cloud filtering method = Binomial
                                     Level 2 only = false
                                     Use family = true
                                     Write statistics = false
                                     Alignment method = GreedyGraphbased4
                                     Multiple hypothesis correction = FDR
                                     Number of threads = 4
                                     Compare aligners = false
                                     Collinear searches only
                                     Visualize GHM.png = false
                                     Visualize Alignment = false
                                     Verbose output = true
                             ************ END i-AdDHoRe parameters *********

                             Creating dataset...                     done. (time: 0.434661s)
                             Mapping gene families...                done. (time: 0.0517709s)
                             Remapping tandem duplicates...  done. (time: 0.00573397s)
                             Writing genelists file...               done. (time: 0.041543s)
                             Collinear Search
                             Level 2 multiplicon detection...        done. (time: 0.929245s)
                             Profile detection...
                             1 multiplicons to evaluate - evaluating level 2 multiplicon... 0 new multiplicons found.
                             Flushing output files...done.
                             Time for Higher Level Detection: 0.003685s.

                             All Done!  Bye...

                    INFO     Processing I-ADHoRe output                                                                               cli.py:711
                    INFO     `minlen` not set, taking 10% of longest scaffold (7869.900000000001) for 20628_8.cds.fasta              viz.py:2714
                    INFO     Dropped 80189 scaffolds in 20628_8.cds.fasta because they are on scaffolds shorter than                 viz.py:2716
                             7869.900000000001
2024-05-29 14:49:27 INFO     Making Syndepth plot                                                                                    viz.py:2753
                    ERROR    No eligible multiplicon discovered in terms of segment length and/or gene number!                       viz.py:1357
                    INFO     Total run time: 1.97 minutes                                                                           core.py:1643
                    INFO     Done                                                                                                   core.py:1644
ilaydagulmez commented 1 month ago

Hi again, thanks for your help! As I saidi-adhore installed and got the Usage: i-adhore [configuration file]. But it's run only in i-adhore/build/src folder so wgd syn did not see i-adhore normally. That's why I asked if any command or parameter for add i-adhore path. So is it possible to write to.pyor command for path?

Thanks!

heche-psb commented 1 month ago

Hi, did you try with command like export PATH="$PATH:i-adhore/build/src/i-adhore" to write the path of binary file into your environment variable? So far there is no option for users to give path to the binary file as to i-adhore but it might be a good suggestion had that i-adhore was repeatedly complained for uneasy installation.

ilaydagulmez commented 1 month ago

Hi, thanks for your advice. I tried and I think it works: WhatsApp Image 2024-06-03 at 11 04 56

But still got the same error:

Screenshot 2024-06-03 at 11 08 39
heche-psb commented 1 month ago

Hi, apparently the node at which your job ran didn't have i-adhore properly installed but your local node did. I think it has something to do with your HPC system. Did you try to run wgd syn locally, by which I mean not submiting it to the calculation node?

ilaydagulmez commented 1 month ago

Yes got the same error again :(

heche-psb commented 1 month ago

Hi, I have just added the parameter of the path to i-adhore executable in wgd syn. You may install the latest commit from this repository and try the command below.

$ wgd syn -f transcript -a ID 20628_8.cds.fasta.tsv.txt 20628-8.gff -o test_path-iadhore --pathiadhore /i-adhore-3.0.01/build/i-adhore/bin/i-adhore

ilaydagulmez commented 1 month ago

Hi, what a great news! Many thanks for your solution! I reinstalled2.0.38, am I did wrong?

heche-psb commented 1 month ago

Hi, you may install wgd from source using the command below.

git clone https://github.com/heche-psb/wgd
cd wgd
virtualenv -p=python3 ENV (or python3 -m venv ENV)
source ENV/bin/activate
pip install numpy==1.19.0
pip install -r requirements.txt
pip install .
ilaydagulmez commented 1 month ago

It's done! Thank you so much for everything and forgive me if I tired you :))

Screenshot 2024-06-03 at 12 22 51