guigolab / geneidx

Nextflow pipeline for genome annotation of protein-coding genes
GNU General Public License v3.0
16 stars 2 forks source link

How to run geneidx in local workstation via docker? #1

Closed BenAawf closed 1 year ago

BenAawf commented 1 year ago

Hi, I have this humble question, newbie to this domain of container. I have all pre-requested already installed

The first thing I did I googled for the docker container : docker pull ferriolcalvet/geneidx It seems to be installed with success. this is log from terminal latest: Pulling from ferriolcalvet/geneidx 23858da423a6: Pulling fs layer 326f452ade5c: Pulling fs layer a42821cd14fb: Pulling fs layer 8471b75885ef: Pull complete 8ffa7aaef404: Pull complete 15132af73342: Pull complete f4aeaae2f8da: Pull complete 5d8aa591a389: Pull complete 4645a1f8dbad: Pull complete b8861be24c63: Pull complete 5067931dafb0: Pull complete 2af56fb3cf52: Pull complete 253e785a8a50: Pull complete f42d46aac9d9: Pull complete f9f8117fb535: Pull complete e15f4b46da28: Pull complete 99bd3a5629a5: Pull complete 27043e5ccd2d: Pull complete 1292524c3142: Pull complete f7b36a643fbe: Pull complete 860293db19cb: Pull complete f3dafd5c9e29: Pull complete b1325f591394: Pull complete 5065735ed845: Pull complete ab5a5ae1904e: Pull complete ad9050f7ac01: Pull complete Digest: sha256:de5f906aa7ac2fccc68235d3329eb89a06b811bc6cc7515035f1bc770127a1b5 Status: Downloaded newer image for ferriolcalvet/geneidx:latest docker.io/ferriolcalvet/geneidx:latest 2nd I use nexflow cmd nextflow run guigolab/geneidx -with-docker --genome reference.fa.gz --taxid 562 and this the msg I got N E X T F L O W ~ version 22.10.0 Projectguigolab/geneidxis currently stickied on revision: main -- you need to explicitly specify a revision with the option-rin order to use it

So how can I run geneidx?

Thank you in advance Ben

FerriolCalvet commented 1 year ago

Hi Ben,

Thank you for the question.

  1. If you have docker and Nextflow installed, you do not need to go and pull the docker container manually, Nextflow takes care of this when you start a run. (good that you did it, but, in principle, this would not be necessary)
  2. This looks like there is some problem on revision for the main branch, we will check this out and solve it as soon as possible. Meanwhile, an alternative way of running geneidx is to clone the repository git clone https://github.com/guigolab/geneidx.git and then run: nextflow run main.nf -with-docker --genome reference.fa.gz --taxid 562

This being said. If the genome you want to annotate is from taxid 562, which is Escherichia coli, geneidx was developed with the goal of annotating eukaryotic genomes so it will not work well for bacterial genomes. I am sorry for not showing this clearly in the description, I will add a clarification regarding this and the alternative way of running geneidx.

Thank you again for your questions, and in case you or your colleagues use geneidx again let me know if you find any other issues!

Ferriol

BenAawf commented 1 year ago

Hello Ferriol, I appreciate your help,

  1. I just installed it as you mentioned and it seems installed correctly.
  2. By the way, I m annotating a eukaryotic genome (Aves species taxid 9079) I use Ecoli to test the functionality of geneidx.

But still unable to run it. NB = The input file is compressed with .gz extension, and I have this Docker problem: Command error: docker: got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/create?name=nxf-UubmZB6Odc0alivrKO4G0fpi": dial unix /var/run/docker.sock: connect: permission denied. See 'docker run --help'

On the other hand, I have this problem with --taxid 9079 look like that taxonomic id doesn't have the minimum required... Run log uploaded

https://drive.google.com/drive/folders/14_LN6N_oW8mhoWYl6AX5ic7wLlMr96DK

Then I used --taxid from closely related species (9030)

However, non of them finish a complete run. I attached log files for both so you can figure out where the problem comes from.

I knew it was too early to use this tool, but I appreciate any help you could provide.

Kind regards

FerriolCalvet commented 1 year ago

Hi Ben, I have checked which could be the cause of this error and it looks like it is probably an error of not being able to run docker properly. (Example with Nextflow) If you have sudo permissions you could try this and it should work, otherwise as you said that you have singularity also installed, you can try changing the -with-docker flag to -with-singularity. As none of the processes inside the pipeline was able to run properly I don't think it is something related to the code, but more related to this permissions of nextflow for executing docker. If the problem still remains there let me know and I will try it myself. Thank you! Ferriol

BenAawf commented 1 year ago

Hi Ferrio, I tried using sudo --with-docker and at least the pipeline could run tres step of their process But the run could not be completed. So i tried with -with-singularity : but unfortunately the pipline stuch on the first step UncompressFASTA. kindly find the attached log error message for both run

1- using Singularity

N E X T F L O W  ~  version 22.10.4
Launching `main.nf` [curious_linnaeus] DSL2 - revision: ccdbe1f8f8

GENEID+BLASTx - NextflowPipeline
=============================================
output          : /media/ben/Data2TB/annotation/geneidx/geneidx/output
genome          : rufa_assembly.s.n.fa.gz
taxon           : 9079

WARN: A process with name 'getFASTA2' is defined more than once in module script: /media/ben/Data2TB/annotation/geneidx/geneidx/subworkflows/CDS_estimates.nf -- Make sure to not define the same function as process
[-        ] process > UncompressFASTA                                              -
executor >  local (1)
executor >  local (1)
executor >  local (1)
[d6/3cd9b0] process > UncompressFASTA (rufa_assembly.s.n.fa.gz)                    [100%] 1 of 1, failed: 1 ✘
[-        ] process > fix_chr_names                                                -
[-        ] process > compress_n_indexFASTA                                        -
[-        ] process > prot_down_workflow:getProtFasta                              -
[-        ] process > prot_down_workflow:downloadProtFasta                         -
[-        ] process > build_protein_DB:UncompressFASTA                             -
[-        ] process > build_protein_DB:runDIAMOND_makedb                           -
[-        ] process > alignGenome_Proteins:runDIAMOND_getHSPs_GFF                  -
[-        ] process > matchAssessment:Index_fai                                    -
[-        ] process > matchAssessment:cds_workflow:mergeMatches                    -
[-        ] process > matchAssessment:cds_workflow:filter_by_score                 -
[-        ] process > matchAssessment:cds_workflow:getFASTA                        -
[-        ] process > matchAssessment:cds_workflow:ORF_finder                      -
[-        ] process > matchAssessment:cds_workflow:updateGFFcoords                 -
[-        ] process > matchAssessment:cds_workflow:getFASTA2                       -
[-        ] process > matchAssessment:getCDS_matrices                              -
[-        ] process > matchAssessment:intron_workflow:summarizeMatches             -
[-        ] process > matchAssessment:intron_workflow:pyComputeIntrons             -
[-        ] process > matchAssessment:intron_workflow:removeProtOverlappingIntrons -
[-        ] process > matchAssessment:intron_workflow:getFASTA                     -
[-        ] process > matchAssessment:getIntron_matrices                           -
[-        ] process > matchAssessment:CombineIni                                   -
[-        ] process > matchAssessment:CombineTrans                                 -
[-        ] process > param_selection_workflow:getParamName                        -
[-        ] process > param_selection_workflow:paramSplit                          -
[-        ] process > param_value_selection_workflow:getParamName                  -
[-        ] process > param_value_selection_workflow:paramSplitValues              -
[-        ] process > creatingParamFile_frommap                                    -
[-        ] process > geneid_WORKFLOW:Index_i                                      -
[-        ] process > geneid_WORKFLOW:runGeneid_fetching                           -
[-        ] process > prep_concat                                                  -
[-        ] process > concatenate_Outputs_once                                     -
[-        ] process > gff3addInfo:manageGff3sectionSplit                           -
[-        ] process > gff3addInfo:gff3intersectHints                               -
[-        ] process > gff3addInfo:processLabels                                    -
[-        ] process > gff3addInfo:manageGff3sectionMerge                           -
[-        ] process > gff34portal                                                  -
Pulling Singularity image docker://ferriolcalvet/geneidx [cache /media/ben/Data2TB/annotation/geneidx/geneidx/./singularity/ferriolcalvet-geneidx.img]
Execution cancelled -- Finishing pending tasks before exit
Oops ...

WARN: Access to undefined parameter `prot_file` -- Initialise it to a default value eg. `params.prot_file = some_value`
WARN: Access to undefined parameter `acceptor_pwm` -- Initialise it to a default value eg. `params.acceptor_pwm = some_value`
Error executing process > 'UncompressFASTA (rufa_assembly.s.n.fa.gz)'

Caused by:
  Process `UncompressFASTA (rufa_assembly.s.n.fa.gz)` terminated with an error exit status (127)

Command executed:

  if [ ! -s  rufa_assembly.s.n.fa ]; then
      echo "unzipping genome rufa_assembly.s.n.fa.gz"
      gunzip -c rufa_assembly.s.n.fa.gz > rufa_assembly.s.n.fa;
  fi

Command exit status:
  127

Command output:
  (empty)

Command error:
  /bin/bash: .command.sh: No such file or directory

Work dir:
  /media/ben/Data2TB/annotation/geneidx/geneidx/work/d6/3cd9b0802199f26441a74d0499ada2`

Sorry for bothering you, and thank you again Ben

FerriolCalvet commented 1 year ago

Hi Ben, For the run you did with docker. I see the error is when getting the parameter file. Just to make sure everything is as expected, is this line still in the params.config file? parameter_path = "$projectDir/data/Parameter_files.taxid/" If so, I will revise everything again and see if it works or not. It could be an error of the container not being able to connect to the parameters directory and then not being able to provide any file for the rest of the steps to follow. I am not sure this makes a big difference, but you could also try -profile docker instead of -with-docker and see if the behavior is different.

For the run in singularity. I could understand that there was an error with the internal commands of the process, but this seems to be related to the Nextflow connection with containers... (these are the Nextflow creators discussing about this 6-7 years ago link It seems that you are also using an external volume for storing the data, so this could be an issue according to this. I have not been able to follow this up and see if they found another solution but maybe you could try and see if you can solve the problem in a similar way or if you can run it without the external device and you don't get this same error at least we know that this was problematic.

Thank you! Ferriol

BenAawf commented 1 year ago

Yes it is there

params {
    genome          = "$projectDir/data/SampleGenomeSmall.fa.gz"
    output          = "$projectDir/output"
    taxid               = "35525"

    parameter_path = "$projectDir/data/Parameter_files.taxid/"

    maps_param_values = [
                                                "no_score"      : -0.10
                                                ]

    proteins_lower_lim = 90000
  proteins_upper_lim = 130000

    general_gene_params = "$projectDir/data/general_gene_model.param"

    match_score_min = 300
    match_ORF_min   = 100

    intron_margin = 40

    min_intron_size = 20
    max_intron_size = 10000

    source_uniprot = 1
}
FerriolCalvet commented 1 year ago

Hi Ben,

I have used the test genome that is defined by default in the genome variable of the config file, and only indicated the --taxid 9079 flag for the run and it worked without any issues. nextflow run main.nf -with-docker --taxid 9079 I suspect that the problem could be similar to the one with the singularity run. Since the container might need to mount in order to access the files in the indicated directory...

If you can, try to run the command above on your computer, not indicating any output, and see if it works. If it does, then try doing the same again but now indicating the same output location as you were indicating before, and see if it works or not. nextflow run main.nf -with-docker --taxid 9079 --output /media/ben/Data2TB/annotation/geneidx/geneidx/output I am not completely sure this would be informative since the connection between the container and the parameter directory should not depend on the output, but if you can test it might be useful.

If you have any other updates regarding the singularity run let me know, or maybe we could ask in the Nextflow community Slack and see if other users have found the same issue with other programs and how did they solve it.

Thank you! Ferriol

BenAawf commented 1 year ago

Hi Ferriiol, Thanks for being so helpful. I tried all your suggestions but non of them could really fix this issue.

nextflow run main.nf -with-docker --taxid 9079 and nextflow run main.nf -with-docker --taxid 9079 --output /media/...................

And for every run there is these error pop up always, even i change the container -with-singularity or sudo....-with-docker

Command error: python3: can't open file '.command.sh': [Errno 2] No such file or directory and `Command error:

bash: can't open file '.command.sh':.............No such file or directory`

This command.sh No such file or directory looks like a typical error with nextflow.

Anyway, I will keep trying to solve this problem. Hopefully, I can run your tool.

Thanks again ben

BenAawf commented 1 year ago

Hi Ferriol It would be great and less complicated if you could implement it with conda package manager. Best ben

BenAawf commented 1 year ago

Hi Ferriol,

When I run the line

nextflow run main.nf -with-docker --taxid 9079

i get the following error:

No such variable: prot_filename

When I try and run either docker or singularity this is also the error I get.

Do you have any ideas on what might be the problem here? Many thanks

FerriolCalvet commented 1 year ago

Hi Ben,

Today I devoted some time to testing geneidx and in Nextflow 22.10 I have the same errors as you. I am already looking into the changes between versions that cause these errors and will fix them as soon as possible.

Meanwhile, using version 22.04.4 version should allow you to run geneidx without any problem. Let me know if it does still have some problems.

export NXF_VER=22.04.4 nextflow run main.nf -with-docker --taxid 9079

I can notify you when we update it to work with newer versions of Nextflow, thank you very much for creating the issue, and sorry for the inconvenience.

Thanks,

Ferriol

BenAawf commented 1 year ago

Hi, Ferriol Using singularity, I could run it without any issue by specifying --bind to mount all folders that contain input data and output folders required by nextflow workflow.

As you mentioned before, it is indeed related to the singularity containers problem. I should pay attention to this early. Thank you again Ben

FerriolCalvet commented 1 year ago

Hi Ben,

Sorry for not replying earlier. I don't fully understand whether it ended up working or not?

During the last couple of days, I updated the docker container and fixed a bug in the script as a process was still using an even older container. So if you could solve the problems with the singularity images this is great, otherwise, you could try removing the Geneidx images that you have downloaded and Nextflow will automatically fetch the latest one.

Let me know if you are still having issues or I can close this issue.

Thanks again for your interest and help by running Geneidx and reporting your problems!

Ferriol

BenAawf commented 1 year ago

Hi Ferriol, I appreciated the effort and your dedicated time to optimize this excellent work of yours. I was going to ask you to close this issue. But unfortunately, I tried another run after your update (actually three runs on different PCs), which is my note.

log_out https://drive.google.com/file/d/1BT3k5eVejY3lCfcUSuPPL5gS4Sss-soq/view?usp=share_link

output::

cd /geneidx/output/species/9079
cut -f3 rufa.-.UniRef90.8782.12+.gff3|sort|uniq -c
`1 ###
 315248 CDS
 315248 exon
  74708 gene
      1 ##gff-version 3
  74708 mRNA
      1 ##sequence-region contig_100 1 14949129
      1 ##sequence-region contig_101 1 17262525
       1970504
      1 ##sequence-region contig_8 1 105765
      ....

Log:

N E X T F L O W  ~  version 22.10.5
Launching `main.nf` [determined_cuvier] DSL2 - revision: cb76033ee8

GENEID+BLASTx - NextflowPipeline
=============================================
output          : /home/ben/geneidx/output
genome          : /home/ben/geneidx/data/SampleGenomeSmall.fa.gz
taxon           : 9079

WARN: A process with name 'getFASTA2' is defined more than once in module script: /home/ben/geneidx/subworkflows/CDS_estimates.nf -- Make sure to not define the same function as process
[-        ] process > UncompressFASTA                                              -
[-        ] process > fix_chr_names                                                -
[-        ] process > compress_n_indexFASTA                                        -
[-        ] process > prot_down_workflow:getProtFasta                              -
executor >  local (4)
executor >  local (5)
[15/4f69a8] process > UncompressFASTA (SampleGenomeSmall.fa.gz)                    [100%] 1 of 1 ✔
[82/439c71] process > fix_chr_names (SampleGenomeSmall.fa)                         [  0%] 0 of 1
executor >  local (8)
[15/4f69a8] process > UncompressFASTA (SampleGenomeSmall.fa.gz)                    [100%] 1 of 1 ✔
executor >  local (10)
executor >  local (11)
[15/4f69a8] process > UncompressFASTA (SampleGenomeSmall.fa.gz)                           [100%] 1 of 1 ✔
executor >  local (13)
executor >  local (14)
[15/4f69a8] process > UncompressFASTA (SampleGenomeSmall.fa.gz)                                              [100%] 1 of 1 ✔
   ### ERROR ###   Max cpus '[:]' is not valid! Using default value: 1
executor >  local (15)
executor >  local (17)
executor >  local (18)
executor >  local (19)
executor >  local (20)
executor >  local (21)
executor >  local (22)
executor >  local (23)
executor >  local (24)
executor >  local (25)
executor >  local (26)
executor >  local (27)
[15/4f69a8] process > UncompressFASTA (SampleGenomeSmall.fa.gz)                                               [100%] 1 of 1 ✔
executor >  local (29)
executor >  local (30)
executor >  local (30)
[15/4f69a8] process > UncompressFASTA (SampleGenomeSmall.fa.gz)                                               [100%] 1 of 1 ✔
[82/439c71] process > fix_chr_names (SampleGenomeSmall.fa)                                                    [100%] 1 of 1 ✔
[d0/1116f6] process > compress_n_indexFASTA (SampleGenomeSmall.clean.fa)                                      [100%] 1 of 1 ✔
[9f/8c1ef7] process > prot_down_workflow:getProtFasta (9079)                                                  [100%] 1 of 1 ✔
[4d/d80c73] process > prot_down_workflow:downloadProtFasta (UniRef90.8782.12+)                                [100%] 1 of 1 ✔
[bc/285136] process > build_protein_DB:UncompressFASTA (UniRef90.8782.12+.fa.gz)                              [100%] 1 of 1 ✔
[d5/2cdf9b] process > build_protein_DB:runDIAMOND_makedb (building UniRef90.8782.12+ database)                [100%] 1 of 1 ✔
[37/8ec66b] process > alignGenome_Proteins:runDIAMOND_getHSPs_GFF (SampleGenomeSmall.clean against UniRef9... [100%] 1 of 1 ✔
[9d/68af77] process > matchAssessment:Index_fai (SampleGenomeSmall.clean.fa)                                  [100%] 1 of 1 ✔
[f9/b68810] process > matchAssessment:cds_workflow:mergeMatches (SampleGenomeSmall.clean.UniRef90.8782.12+... [100%] 1 of 1 ✔
[ac/aa4c74] process > matchAssessment:cds_workflow:filter_by_score (SampleGenomeSmall.clean.UniRef90.8782.... [100%] 1 of 1 ✔
[e1/bbccce] process > matchAssessment:cds_workflow:getFASTA (SampleGenomeSmall.clean.UniRef90.8782.12+.hsp... [100%] 1 of 1 ✔
[8d/adce2f] process > matchAssessment:cds_workflow:ORF_finder (SampleGenomeSmall.clean.UniRef90.8782.12+.h... [100%] 1 of 1 ✔
[e6/3652ae] process > matchAssessment:cds_workflow:updateGFFcoords (SampleGenomeSmall.clean.UniRef90.8782.... [100%] 1 of 1 ✔
[ab/ac5d93] process > matchAssessment:cds_workflow:getFASTA2 (SampleGenomeSmall.clean.UniRef90.8782.12+.hs... [100%] 1 of 1 ✔
[63/3bc92f] process > matchAssessment:getCDS_matrices (SampleGenomeSmall.clean.UniRef90.8782.12+.hsp.ORFs)    [100%] 1 of 1 ✔
[e0/993466] process > matchAssessment:intron_workflow:summarizeMatches (SampleGenomeSmall.clean.UniRef90.8... [100%] 1 of 1 ✔
[dc/83bd7b] process > matchAssessment:intron_workflow:pyComputeIntrons (SampleGenomeSmall.clean.UniRef90.8... [100%] 1 of 1 ✔
[ea/85de9a] process > matchAssessment:intron_workflow:removeProtOverlappingIntrons (SampleGenomeSmall.clea... [100%] 1 of 1 ✔
[e7/3146f5] process > matchAssessment:intron_workflow:getFASTA (SampleGenomeSmall.clean.UniRef90.8782.12+.... [100%] 1 of 1 ✔
[55/f1e01d] process > matchAssessment:getIntron_matrices (SampleGenomeSmall.clean.UniRef90.8782.12+.hsp.mo... [100%] 1 of 1 ✔
[cd/fef64d] process > matchAssessment:CombineIni (SampleGenomeSmall.clean.UniRef90.8782.12+.hsp.ORFs.5)       [100%] 1 of 1 ✔
[f2/dec639] process > matchAssessment:CombineTrans (SampleGenomeSmall.clean.UniRef90.8782.12+.hsp.ORFs.5)     [100%] 1 of 1 ✔
[60/aa2e31] process > param_selection_workflow:getParamName (9079)                                            [100%] 1 of 1 ✔
[b3/c182f0] process > param_selection_workflow:paramSplit (Homo_sapiens.9606)                                 [100%] 1 of 1 ✔
executor >  local (57)
executor >  local (58)
executor >  local (59)
executor >  local (60)
executor >  local (61)
executor >  local (62)
executor >  local (63)
[15/4f69a8] process > UncompressFASTA (SampleGenomeSmall.fa.gz)                                               [100%] 1 of 1 ✔
[82/439c71] process > fix_chr_names (SampleGenomeSmall.fa)                                                    [100%] 1 of 1 ✔
[d0/1116f6] process > compress_n_indexFASTA (SampleGenomeSmall.clean.fa)                                      [100%] 1 of 1 ✔
[9f/8c1ef7] process > prot_down_workflow:getProtFasta (9079)                                                  [100%] 1 of 1 ✔
[4d/d80c73] process > prot_down_workflow:downloadProtFasta (UniRef90.8782.12+)                                [100%] 1 of 1 ✔
[bc/285136] process > build_protein_DB:UncompressFASTA (UniRef90.8782.12+.fa.gz)                              [100%] 1 of 1 ✔
[d5/2cdf9b] process > build_protein_DB:runDIAMOND_makedb (building UniRef90.8782.12+ database)                [100%] 1 of 1 ✔
[37/8ec66b] process > alignGenome_Proteins:runDIAMOND_getHSPs_GFF (SampleGenomeSmall.clean against UniRef9... [100%] 1 of 1 ✔
[9d/68af77] process > matchAssessment:Index_fai (SampleGenomeSmall.clean.fa)                                  [100%] 1 of 1 ✔
[f9/b68810] process > matchAssessment:cds_workflow:mergeMatches (SampleGenomeSmall.clean.UniRef90.8782.12+... [100%] 1 of 1 ✔
[ac/aa4c74] process > matchAssessment:cds_workflow:filter_by_score (SampleGenomeSmall.clean.UniRef90.8782.... [100%] 1 of 1 ✔
[e1/bbccce] process > matchAssessment:cds_workflow:getFASTA (SampleGenomeSmall.clean.UniRef90.8782.12+.hsp... [100%] 1 of 1 ✔
[8d/adce2f] process > matchAssessment:cds_workflow:ORF_finder (SampleGenomeSmall.clean.UniRef90.8782.12+.h... [100%] 1 of 1 ✔
[e6/3652ae] process > matchAssessment:cds_workflow:updateGFFcoords (SampleGenomeSmall.clean.UniRef90.8782.... [100%] 1 of 1 ✔
[ab/ac5d93] process > matchAssessment:cds_workflow:getFASTA2 (SampleGenomeSmall.clean.UniRef90.8782.12+.hs... [100%] 1 of 1 ✔
[63/3bc92f] process > matchAssessment:getCDS_matrices (SampleGenomeSmall.clean.UniRef90.8782.12+.hsp.ORFs)    [100%] 1 of 1 ✔
[e0/993466] process > matchAssessment:intron_workflow:summarizeMatches (SampleGenomeSmall.clean.UniRef90.8... [100%] 1 of 1 ✔
[dc/83bd7b] process > matchAssessment:intron_workflow:pyComputeIntrons (SampleGenomeSmall.clean.UniRef90.8... [100%] 1 of 1 ✔
[ea/85de9a] process > matchAssessment:intron_workflow:removeProtOverlappingIntrons (SampleGenomeSmall.clea... [100%] 1 of 1 ✔
[e7/3146f5] process > matchAssessment:intron_workflow:getFASTA (SampleGenomeSmall.clean.UniRef90.8782.12+.... [100%] 1 of 1 ✔
[55/f1e01d] process > matchAssessment:getIntron_matrices (SampleGenomeSmall.clean.UniRef90.8782.12+.hsp.mo... [100%] 1 of 1 ✔
[cd/fef64d] process > matchAssessment:CombineIni (SampleGenomeSmall.clean.UniRef90.8782.12+.hsp.ORFs.5)       [100%] 1 of 1 ✔
[f2/dec639] process > matchAssessment:CombineTrans (SampleGenomeSmall.clean.UniRef90.8782.12+.hsp.ORFs.5)     [100%] 1 of 1 ✔
[60/aa2e31] process > param_selection_workflow:getParamName (9079)                                            [100%] 1 of 1 ✔
[b3/c182f0] process > param_selection_workflow:paramSplit (Homo_sapiens.9606)                                 [100%] 1 of 1 ✔
[01/6790c3] process > param_value_selection_workflow:getParamName (9079)                                      [100%] 1 of 1 ✔
[cc/a10f98] process > param_value_selection_workflow:paramSplitValues (Homo_sapiens.9606)                     [100%] 1 of 1 ✔
[2e/8643c4] process > creatingParamFile_frommap                                                               [100%] 1 of 1 ✔
[e2/21013c] process > geneid_WORKFLOW:Index_i (SampleGenomeSmall.clean.fa)                                    [100%] 1 of 1 ✔
[c1/2f3828] process > geneid_WORKFLOW:runGeneid_fetching (run Geneid 21_random)                               [100%] 27 of 27 ✔
[90/d3db31] process > prep_concat (create file for concatenation SampleGenomeSmall.-.UniRef90.8782.12+.gff3)  [100%] 1 of 1 ✔
[4b/f68ae1] process > concatenate_Outputs_once (adding to SampleGenomeSmall.-.UniRef90.8782.12+.gff3)         [100%] 1 of 1 ✔
[49/d54991] process > gff3addInfo:manageGff3sectionSplit (SampleGenomeSmall.-.UniRef90.8782.12+.gff3)         [100%] 1 of 1 ✔
[11/663229] process > gff3addInfo:gff3intersectHints (SampleGenomeSmall.-.UniRef90.8782.12+.content)          [100%] 1 of 1 ✔
[46/5cd5db] process > gff3addInfo:processLabels (SampleGenomeSmall.-.UniRef90.8782.12+.labelled.tsv)          [100%] 1 of 1 ✔
[dc/d14f77] process > gff3addInfo:manageGff3sectionMerge (SampleGenomeSmall.-.UniRef90.8782.12+)              [100%] 1 of 1 ✔
[a2/a0b645] process > gff34portal (SampleGenomeSmall.-.UniRef90.8782.12+.gff3)                                [100%] 1 of 1 ✔

Done!

Completed at: 21-Jan-2023 00:19:20
Duration    : 1m 1s
CPU hours   : (a few seconds)
Succeeded   : 63

Very sorry for bothering you again. Any feedback is very appreciated.

Kind regards Ben

FerriolCalvet commented 1 year ago

Hi Ben,

In the second command, where you indicate that geneidx is not working, the genome is being passed with the --assembly label. Try changing this into --genome and it should work as the first command. If you cannot get it to work using a different genome from the one given as example, let me know!

Thank you!

Ferriol

BenAawf commented 1 year ago

Hello Ferriol, My apologies for not being able to reply earlier. I followed your solution and used --genome instead of --assembly. And yeah, the issue is solved, thank to you.

So since I can run geneidx without any problem using the following command: nextflow run main.nf -profile singularity --genome /media/ben/Data2TB/arufa-project/annotation/scaffolds_vf.EDTA_RM_masked.fa.gz --taxid 9079 --bind . I will close this issue as it is solved.

I would appreciate a detailed description of each output file generated by geneidx and how we can add other evidence protDB rather than the automated UniRef90. And if you can determine if it is appropriate to combine geneidx with other automated annotation tools output, such as BRAKER2, that will be a great addition.

Thank you so much for the time you take to reply. ben

FerriolCalvet commented 1 year ago

Hello Ben,

Thank you very much for your replies also and for asking questions that helped us improve the pipeline! I will take into account your comments and I will try to document it soon. In the mean time... The output files are:

To provide protein evidence rather than the automated selection of proteins you should be able to provide a fasta file using the --prot-file flag in the same command you use for the execution.

And regarding the combination with other gene predictors I believe it is appropriate to combine it with BRAKER2 and other softwares since you will be able to classify your genes with higher or lower confidence.

As I said, I will try to provide a better documentation in the README soon.

Thanks again and best wishes!

Ferriol