Closed seveein closed 1 month ago
Dear @seveein,
I've never seen this error message before. I suspect it is not directly related to genEra
, but to the singularity installation or the way that you are running singularity using -B $WORK/
. Do these errors appear before or after the message genEra v1.X.X (C) Max Planck Society for the Advancement of Science
? That way I can know if the error happened within the genEra
code or not.
Best,
Josué
Hi Josué, The error appears right at the beginning, before the genEra [...] prompt. So it might be a singularity-related issue. GenEra seems to run, although we observed the following:
STARTING STEP 3: ASSIGNING AGES TO YOUR QUERY GENES WITH Erassignment
--------------------------------------------------
Splitting results per query gene using 16 threads
Fatal error: cannot open file 'Usage:': No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
[...]
Running Erassignment using 16 threads
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/true
The gene_ages.tsv file is empty afterwards. Do you have an idea how we could resolve this issue? Best, s-
Dear @seveein,
The container seems to be working fine, and your error messages point towards some issues related to singularity. I suspect the main culprit is using singularity run
instead of singularity exec
.
Could you try something akin to this command?
# Establish the working directory for your genEra run
WORKDIR=/your/working/directory
cd $WORKDIR
# Add any other important path(s) for singularity to find (e.g., the path to the NR database or the directory where you wish to write the output files)
export SINGULARITY_BIND="/any/other/relevant/path:/any/other/relevant/path"
# Run genEra using 'exec' and specifying the absolute path of your files
singularity exec /path/to/singularity/genera_latest.sif genEra \
-q sequences.fasta -t 4084 -b /path/to/database/nr \
-d /path/to/database/taxdump -n 16 \
-o /any/other/relevant/path/output
Please let me know if this works for you so I can update the wiki for singularity users.
Best, Josué
``Dear Josué, thank you very much already. Unfortunately, I still observe the same issues after implementing the adjustments.
Splitting results per query gene using 32 threads
Fatal error: cannot open file 'Usage:': No such file or directory
--------------------------------------------------
Running Erassignment using 32 threads
/usr/bin/false
/usr/bin/false
All user-provided paths are available to singularity. There should be also enough computational resources available.
Best, s.
edit: current singularity call:
export SINGULARITY_BIND="/path:/mnt"
singularity exec $HOME/programs/GenEra/genEra.sif genEra \
-q /mnt/data/pep_clean.fasta \
-t 4084 -b /mnt/TAI/db/nr \
-d /mnt/TAI/db/taxdump/ \
-c /mnt/TAI/out_spen//ncbi_lineages_2024-09-02.csv \
-p /mnt/tmp_4084_7909/4084_Diamond_results.bout \
-x /mnt/tmp_4084_7909/ \
-o /mnt/TAI/out_spen/ \
-n 32
Dear @seveein,
Could you please send me the complete STDOUT log from the genEra
run? I'd like to see which step is not working correctly in the pipeline. I see you're using the arguments -c
and -p
, meaning that at least step 1 and step 2 of the pipeline are running correctly. Could you also please verify that ncbi_lineages_2024-09-02.csv
and 4084_Diamond_results.bout
are not empty? Please send me the last 10 lines of these two files (i.e., tail ncbi_lineages_2024-09-02.csv
and tail 4084_Diamond_results.bout
) for me to check if step 1 and step 2 ran correctly.
Best, Josué
Dear @josuebarrera.
Here are the complete STDOUT and -tail
of ncbi_lineages,csv
and 4084_Diamond_results.bout
.
ncbi_lineages_2024-09-02.csv:
3315561,Eukaryota,Chordata,Lepidosauria,Squamata,Viperidae,Crotalus,Crotalus helleri,,Opisthokonta,Eumetazoa,Amniota,Sauropsida,Sauria,Bifurcata,Unidentata,Episquamata,Toxicofera,,,,Bilateria,,,,Deuterostomia,Vertebrata,Gnathostomata,Teleostomi,Euteleostomi,Dipnotetrapodomorpha,Tetrapoda,,,,,,,Serpentes,,Metazoa,,cellular organisms,,,,,,,,,,,,,,,,,Crotalinae,,,,Craniata,,Crotalus helleri caliginis,,Sarcopterygii,Colubroidea,,,,
3315602,Eukaryota,Ascomycota,Saccharomycetes,Saccharomycetales,Metschnikowiaceae,Sungouiella,Sungouiella xylosa,,Opisthokonta,saccharomyceta,,,,,,,,,,,CUG-Ser1 clade,,,,,,,,,,,,,,,,,,,Fungi,,cellular organisms,,,,,,,,,,,,,,,,,,,Dikarya,,Saccharomycotina,,,,,,,,,
3315603,Eukaryota,Ascomycota,Saccharomycetes,Saccharomycetales,Metschnikowiaceae,Clavispora,Clavispora paralusitaniae,,Opisthokonta,saccharomyceta,,,,,,,,,,,CUG-Ser1 clade,,,,,,,,,,,,,,,,,,,Fungi,,cellular organisms,,,,,,,,,,,,,,,,,,,Dikarya,,Saccharomycotina,,,,,,,,,
3315604,Eukaryota,Ascomycota,Saccharomycetes,Saccharomycetales,Metschnikowiaceae,Soucietia,,,Opisthokonta,saccharomyceta,,,,,,,,,,,CUG-Ser1 clade,,,,,,,,,,,,,,,,,,,Fungi,,cellular organisms,,,,,,,,,,,,,,,,,,,Dikarya,,Saccharomycotina,,,,,,,,,
3315605,Eukaryota,Ascomycota,Saccharomycetes,Saccharomycetales,Metschnikowiaceae,Sungouiella,,,Opisthokonta,saccharomyceta,,,,,,,,,,,CUG-Ser1 clade,,,,,,,,,,,,,,,,,,,Fungi,,cellular organisms,,,,,,,,,,,,,,,,,,,Dikarya,,Saccharomycotina,,,,,,,,,
3315606,Eukaryota,Ascomycota,Saccharomycetes,Saccharomycetales,Metschnikowiaceae,Osmozyma,,,Opisthokonta,saccharomyceta,,,,,,,,,,,CUG-Ser1 clade,,,,,,,,,,,,,,,,,,,Fungi,,cellular organisms,,,,,,,,,,,,,,,,,,,Dikarya,,Saccharomycotina,,,,,,,,,
3315610,Eukaryota,Ascomycota,Saccharomycetes,Saccharomycetales,Metschnikowiaceae,Tanozyma,,,Opisthokonta,saccharomyceta,,,,,,,,,,,CUG-Ser1 clade,,,,,,,,,,,,,,,,,,,Fungi,,cellular organisms,,,,,,,,,,,,,,,,,,,Dikarya,,Saccharomycotina,,,,,,,,,
3315611,Eukaryota,Ascomycota,Saccharomycetes,Saccharomycetales,Metschnikowiaceae,Gabaldonia,,,Opisthokonta,saccharomyceta,,,,,,,,,,,CUG-Ser1 clade,,,,,,,,,,,,,,,,,,,Fungi,,cellular organisms,,,,,,,,,,,,,,,,,,,Dikarya,,Saccharomycotina,,,,,,,,,
3315612,Eukaryota,Ascomycota,Saccharomycetes,Saccharomycetales,Metschnikowiaceae,Wilhelminamyces,,,Opisthokonta,saccharomyceta,,,,,,,,,,,CUG-Ser1 clade,,,,,,,,,,,,,,,,,,,Fungi,,cellular organisms,,,,,,,,,,,,,,,,,,,Dikarya,,Saccharomycotina,,,,,,,,,
3316682,Eukaryota,Ascomycota,Dipodascomycetes,Dipodascales,,,,,Opisthokonta,saccharomyceta,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Fungi,,cellular organisms,Dipodascales incertae sedis,,,,,,,,,,,,,,,,,,Dikarya,,Saccharomycotina,,,,,,,,,
4084_Diamond_results.bout
[RNAseq_work]$ tail tmp_4084_7909/4084_Diamond_results.bout
GeneExt~Sopen12g035030.1.p1 PNY24194.1 8.78e-06 65.1 45235
GeneExt~Sopen12g035030.1.p1 KQP56511.1 8.84e-06 63.9 1736321
GeneExt~Sopen12g035030.1.p1 CAE6417795.1 8.88e-06 65.1 456999
GeneExt~Sopen12g035030.1.p1 KAK2612949.1 8.92e-06 65.1 1105319
GeneExt~Sopen12g035030.1.p1 ABW09584.1 9.46e-06 64.7 298653
GeneExt~Sopen12g035030.1.p1 WP_291692835.1 9.80e-06 64.7 376
GeneExt~Sopen12g035030.1.p1 WP_269613200.1 9.90e-06 64.7 1219
GeneExt~Sopen12g035030.1.p2 PHT48895.1 7.48e-11 72.0 33114
GeneExt~Sopen12g035030.1.p3 PHT48910.1 2.76e-10 68.9 33114
GeneExt~Sopen12g035030.1.p3 PHT60957.1 1.07e-07 61.6 4072
Stdout:
Illegal option --
Illegal option --
Illegal option --
Illegal option --
Illegal option --
Illegal option --
genEra v1.4.0 (C) Max Planck Society for the Advancement of Science
Starting time of run:
Tue Sep 3 18:06:07 CEST 2024
Your temporary files will be stored in /mnt/tmp_4084_7909/tmp_4084_20086
DIAMOND OUTPUT ALREADY GENERATED. SKIPPING STEP 1
We're just going to quickly cluster the query genes against themselves for later on (step 3)
THE SPECIES-TAILORED TAXONOMIC DATABASE WAS PROVIDED BY THE USER. SKIPPING STEP 2
STARTING STEP 3: ASSIGNING AGES TO YOUR QUERY GENES WITH Erassignment
--------------------------------------------------
Splitting results per query gene using 16 threads
Fatal error: cannot open file 'Usage:': No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
sed: can't read Usage:: No such file or directory
sed: can't read [-a]: No such file or directory
sed: can't read args: No such file or directory
--------------------------------------------------
Running Erassignment using 16 threads
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/false
/usr/bin/true
/usr/bin/true
/usr/bin/true
/usr/bin/true
/usr/bin/true
/usr/bin/true
/usr/bin/true
/usr/bin/true
/usr/bin/true
/usr/bin/true
/usr/bin/true
/usr/bin/true
/usr/bin/true
/usr/bin/false
/usr/bin/true
/usr/bin/false
/usr/bin/true
/usr/bin/false
/usr/bin/true
--------------------------------------------------
Running mcl to define gene families
.................................................. 1M
.................................................. 2M
.................................................. 3M
.................................................. 4M
.................................................. 5M
.................................................. 6M
.................................................. 7M
.................................................. 8M
.................................................. 9M
.................................................. 10M
.................................................. 11M
.................................................. 12M
.................................................. 13M
.................................................. 14M
.................................................. 15M
.................................................. 16M
.................................................. 17M
.................................................. 18M
.................................................. 19M
.................................................. 20M
.................................................. 21M
.................................................. 22M
.................................................. 23M
.................................................. 24M
.................................................. 25M
.................................................. 26M
.................................................. 27M
.................................................. 28M
.................................................. 29M
.................................................. 30M
.................................................. 31M
.................................................. 32M
.................................................. 33M
.................................................. 34M
.................................................. 35M
.................................................. 36M
.................................................. 37M
.................................................. 38M
.................................................. 39M
.................................................. 40M
.................................................. 41M
.................................................. 42M
.................................................. 43M
.................................................. 44M
.................................................. 45M
.................................................. 46M
.................................................. 47M
.................................................. 48M
.................................................. 49M
.................................................. 50M
.................................................. 51M
.................................................. 52M
........
[mclIO] writing </mnt/tmp_4084_7909/tmp_4084_20086/tmp_4084.mci>
.......................................
[mclIO] wrote native interchange 71751x71751 matrix with 52803377 entries to stream </mnt/tmp_4084_7909/tmp_4084_20086/tmp_4084.mci>
[mclIO] wrote 71751 tab entries to stream </mnt/tmp_4084_7909/tmp_4084_20086/tmp_4084.tab>
[mcxload] tab has 71751 entries
[mclIO] reading </mnt/tmp_4084_7909/tmp_4084_20086/tmp_4084.mcl>
.......................................
[mclIO] read native interchange 71751x16687 matrix with 71751 entries
--------------------------------------------------
Establishing the age and number of gene-family founder events
--------------------------------------------------
Step 3 finished!
The age assignment for your individual genes can be found in /mnt/TAI/out_spen//4084_gene_ages.tsv
The possible ages for the genes with a taxonomic representativeness below 30 percent can be found in /mnt/TAI/out_spen//4084_ambiguous_phylostrata.tsv
The estimation of gene family founder events can be found in /mnt/TAI/out_spen//4084_founder_events.tsv
The number of individual genes that could be assigned to each phylostratum are summarized in /mnt/TAI/out_spen//4084_gene_age_summary.tsv
The number of of gene family founder events per phylostratum are summarized in /mnt/TAI/out_spen//4084_founder_summary.tsv
genEra finished at:
Tue Sep 3 18:17:04 CEST 2024
Enjoy your results!!!
gene_age_summary.tsv
#number_of_genes phylostratum phylorank
0 Eukaryota 72
0 Streptophyta 71
0 Magnoliopsida 70
0 Solanales 69
0 Solanaceae 68
0 Solanum 67
0 Solanum pimpinellifolium 66
0 Embryophyta 65
0 Tracheophyta 64
Let me know whether you need anything else!
Best, s.
Dear @seveein,
It seems that step 1 ran without any issues, so you can keep using the file 4084_Diamond_results.bout
to avoid running that part of the pipeline again. I see two possible sources of error in the pipeline:
The first thing I noticed is that the file ncbi_lineages_2024-09-02.csv
should be specified with -r
instead of -c
, since it is an intermediate file in step 2. Could it be that you have a file named 4084_ncbi_lineages.csv
within the output files of your initial GenEra run? Because that is the file that you should specify to GenEra with the argument -c
. It could be an error in step 2, but I can't imagine why it would fail. Try running GenEra using -r ncbi_lineages_2024-09-02.csv
and let me know if the file 4084_ncbi_lineages.csv
was generated.
The other possible source of error I see could be in the script FASTSTEP3R. This is an R script that makes step 3 run much faster than in the initial versions of GenEra, but it also consumes a considerable amount of memory and may be the cause of your errors. To verify this, could you please run GenEra again by adding the following argument:
-F false
This will disable fast mode for step 3, which should be able to run without any issues. I expect GenEra to take a considerable amount of time on this step though, given that you're working with a plant genome.
I'm still puzzled about the error Illegal option --
at the beginning of your log, but it is hopefully nothing to be worried about.
Please let me know if these two things solve your issues.
Cheers, Josué
Hi Jossué,
thank you for your help.
the 4084_ncbi_lineages.csv
was also generated before. However, I adjusted the option from '-c' to '-r' for the test-run, which successfully completed Step 2.
Step 3 finished within minutes and returned the same error prompt again:
STARTING STEP 3: ASSIGNING AGES TO YOUR QUERY GENES WITH Erassignment
sed: can't read Usage: /usr/bin/which [-a] args: No such file or directory
sed: can't read Usage: /usr/bin/which [-a] args: No such file or directory
sed: can't read Usage: /usr/bin/which [-a] args: No such file or directory
The issue seems to be related to Step 3. Adjusting to -F false
did not improve the situation. It happens pretty early in the execution.
cheers, s.
Quick update:
I've been experimenting with the Singularity settings because it seemed that Singularity was mishandling environment variables.
Using the --cleanenv
option has resolved the initial issue. The repeated
Illegal option --
Illegal option --
Illegal option --
messages are no longer appearing in the STDOUT.
Additionally, Step 3 appears to be running more stably now and isn't skipping the analysis. However, it’s still in progress, so please proceed with caution.
STDOUT:
genEra v1.4.0 (C) Max Planck Society for the Advancement of Science
Starting time of run:
Wed Sep 4 08:56:32 CEST 2024
Your temporary files will be stored in /mnt/TAI/TEMP/tmp_4084_1693
DIAMOND OUTPUT ALREADY GENERATED. SKIPPING STEP 1
We're just going to quickly cluster the query genes against themselves for later on (step 3)
THE SPECIES-TAILORED TAXONOMIC DATABASE WAS PROVIDED BY THE USER. SKIPPING STEP 2
STARTING STEP 3: ASSIGNING AGES TO YOUR QUERY GENES WITH Erassignment
--------------------------------------------------
Running Erassignment using 64 threads
troubleshooting command:
singularity run --cleanenv $HOME/programs/GenEra/genEra.sif genEra\
-q pennellii_longest_orfs_pep_clean.fasta \
-t 4084 -b /mnt/db/nr \
-d /mnt/TAI/db/taxdump/ \
-c /mnt/out_spen//4084_ncbi_lineages.csv \
-p /mnt/tmp_4084_7909/4084_Diamond_results.bout \
-x /mnt/TAI/TEMP/ \
-o /mnt/TAI/out_spen/ \
-F false \
-n 64
cheers, s.
Dear @seveein,
It seems that --cleanenv
solved the issue. You can check if step 3 is running correctly by checking inside of /mnt/TAI/out_spen/
to see if a file named 4084_gene_ages.tsv
is being written. If GenEra takes too much time, you can try running it again by deleting -F false
to enable fast mode (the final results should be the same).
Cheers, Josué
Dear @josuebarrera,
--cleanenv
solved all the issues. Step 3 was just completed successfully. (maybe this could be added to the wiki as well)
I appreciate your help!
Best, S.
Hi everyone, thank you very much for this nice contribution to the community! I tested GenEra on our cluster.
here, I observed the following on stdout:
After this, it seems like GenEra continues with the analysis:
To Reproduce
I use the Docker-Container via singularity:
Is this a known bug, and can I trust that GenEra will continue without further issues? Thank you very much. cheers s