Open StepanSaenko opened 1 year ago
Hi Stepan, Have you tried this? Also, thanks for letting me know that. I'll look into updating Panther version 17 to 18.
Thank you for your reply. I reduced the number of species and dates, but the error still appears. I got 3 species now:
Bombyx_mori,Drosophila_melanogaster -286.000
Drosophila_melanogaster,Tribolium_castaneum -333.000
Tribolium_castaneum,Bombyx_mori -333.000
Also, the one tiny problem is here: even if the panther database archive was downloaded before, [14/545db0] process > DOWNLOAD_PANTHER_DATABASE [ 0%] 0 of 1
is running. Ctrl+C aborts this part, but every time it tries to download.
On my data all the steps seem to be suspiciously fast. It makes me suggest the problem is my data. I'm sorry for disturbing you.
HI Stepan, No worries.
panther_hmm_database_location
in params.config
to the absolute path of the database?Please, don't hesitate to ask if you get stuck again.
Could you please try the workflow using my data?
urls.txt
Tenebrio_molitor.fna,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/907/166/875/GCA_907166875.3_Tenebrio_molitor_v3/GCA_907166875.3_Tenebrio_molitor_v3_genomic.fna.gz Tenebrio_molitor.gff,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/907/166/875/GCA_907166875.3_Tenebrio_molitor_v3/GCA_907166875.3_Tenebrio_molitor_v3_genomic.gff.gz Tenebrio_molitor.faa,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/907/166/875/GCA_907166875.3_Tenebrio_molitor_v3/GCA_907166875.3_Tenebrio_molitor_v3_protein.faa.gz Tenebrio_molitor.cds,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/907/166/875/GCA_907166875.3_Tenebrio_molitor_v3/GCA_907166875.3_Tenebrio_molitor_v3_cds_from_genomic.fna.gz Tribolium_castaneum.fna,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/335/GCF_000002335.3_Tcas5.2/GCF_000002335.3_Tcas5.2_genomic.fna.gz Tribolium_castaneum.gff,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/335/GCF_000002335.3_Tcas5.2/GCF_000002335.3_Tcas5.2_genomic.gff.gz Tribolium_castaneum.faa,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/335/GCF_000002335.3_Tcas5.2/GCF_000002335.3_Tcas5.2_protein.faa.gz Tribolium_castaneum.cds,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/335/GCF_000002335.3_Tcas5.2/GCF_000002335.3_Tcas5.2_cds_from_genomic.fna.gz Tribolium_madens.fna,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/015/345/945/GCF_015345945.1_Tmad_KSU_1.1/GCF_015345945.1_Tmad_KSU_1.1_genomic.fna.gz Tribolium_madens.gff,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/015/345/945/GCF_015345945.1_Tmad_KSU_1.1/GCF_015345945.1_Tmad_KSU_1.1_genomic.gff.gz Tribolium_madens.faa,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/015/345/945/GCF_015345945.1_Tmad_KSU_1.1/GCF_015345945.1_Tmad_KSU_1.1_protein.faa.gz Tribolium_madens.cds,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/015/345/945/GCF_015345945.1_Tmad_KSU_1.1/GCF_015345945.1_Tmad_KSU_1.1_cds_from_genomic.fna.gz Zophobas_morio.fna,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/027/724/725/GCA_027724725.1_ASM2772472v1/GCA_027724725.1_ASM2772472v1_genomic.fna.gz Zophobas_morio.gff,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/027/724/725/GCA_027724725.1_ASM2772472v1/GCA_027724725.1_ASM2772472v1_genomic.gff.gz Zophobas_morio.faa,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/027/724/725/GCA_027724725.1_ASM2772472v1/GCA_027724725.1_ASM2772472v1_protein.faa.gz Zophobas_morio.cds,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/027/724/725/GCA_027724725.1_ASM2772472v1/GCA_027724725.1_ASM2772472v1_cds_from_genomic.fna.gz Dendroctonus_ponderosae.fna,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/020/466/585/GCF_020466585.1_Dpon_F_20191213v2/GCF_020466585.1_Dpon_F_20191213v2_genomic.fna.gz Dendroctonus_ponderosae.gff,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/020/466/585/GCF_020466585.1_Dpon_F_20191213v2/GCF_020466585.1_Dpon_F_20191213v2_genomic.gff.gz Dendroctonus_ponderosae.faa,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/020/466/585/GCF_020466585.1_Dpon_F_20191213v2/GCF_020466585.1_Dpon_F_20191213v2_protein.faa.gz Dendroctonus_ponderosae.cds,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/020/466/585/GCF_020466585.1_Dpon_F_20191213v2/GCF_020466585.1_Dpon_F_20191213v2_cds_from_genomic.fna.gz
dates.txt
Tribolium_castaneum,Tenebrio_molitor -172.000 Dendroctonus_ponderosae,Tribolium_castaneum -215.000
venn_species_max_5.txt
Tribolium castaneum Tenebrio molitor Dendroctonus ponderosae Tribolium madens Zophobas morio
I changed the dataset and still got the error
Hi Stepan, Is it the same exact error message? Can you also post your params.config? I'll run it as soon as I'm free.
Of course:
params { dir = '/home/saenkos/comparing3' species_of_interest = 'Tribolium_castaneum' species_of_interest_panther_HMM_for_gene_names_url = 'http://data.pantherdb.org/ftp/sequence_classifications/current_release/PANTHER_Sequence_Classification_files/PTHR18.0_tribolium' panther_hmm_database_location = '/home/saenkos/comparing3/PantherHMM_18.0' urls = "${projectDir}/urls.txt" dates = "${projectDir}/dates.txt" comparisons_4DTv = "${projectDir}/comparisons_4DTv.txt" venn_species_max_5 = "${projectDir}/venn_species_max_5.txt" genes = "${projectDir}/genes.txt" cafe5_n_gamma_cats = 1 // If 1 then use the base model; else use the gamma model with <cafe5_n_gamma_cats> gamma categories to test cafe5_pvalue = 0.01 go_term_enrich_genome_id = 7070 // go_term_enrich_annotation_id = "GO:0008150" go_term_enrich_test = "FISHER" go_term_enrich_correction = "FDR" go_term_enrich_ngenes_per_test = 100 go_term_enrich_ntests = 5 } includeConfig 'process.config'
The same error cannot open file 'ORTHOGROUPS_SINGLE_GENE.NT.timetree.nex': No such file or directory
Can youplease try version 17.0 panther databases first, i.e. use the following in params.config
:
species_of_interest_panther_HMM_for_gene_names_url = 'http://data.pantherdb.org/ftp/sequence_classifications/17.0/PANTHER_Sequence_Classification_files/PTHR17.0_arabidopsis'
panther_hmm_database_location = 'http://data.pantherdb.org/ftp/panther_library/17.0/PANTHER17.0_hmmscoring.tgz'
Changed, the same result. But for the test data it works using v18. Could be the distances between species too far?
Or maybe it happened because I've left genes.txt
empty?
Hi Stepan,
Thanks for trying both versions. Have you looked at the detailed error messsges on compare_genomes/work/<most_recent_folders>/<hash_signature>/
? The files are hidden, i.e. .command.sh
, .command.log
and .command.err
.
Empty genes.txt
should not affect the time tree. The divergence time might be an issue if it clashes with what the sequence differences show.
I tried to look through the last .err and .log, there are two odd errors:
1)cut: /home/saenkos/compare_genomes/modules/genes.txt: No such file or directory
idk why there is /modules/
directory
2) conda 23.7.3 requires requests<3,>=2.27.0, but you have requests 2.22.0 which is incompatible.
also this is strange, because the test files work
All the other errors seem to be in a cycle
Did you try to run the workflow on my files?
Oh, I just noticed that the paths in your params.config
may not be pointing to the correct locations. Are your input url, dates, etc lists inside compare_genomes/config/
? If so, then the locations in params.config
should be like urls = "${projectDir}/../config/urls.txt"
and not just urls = "${projectDir}/urls.txt"
.
Also, all my VMs are currently busy with other analyses and jobs running for the next few weeks.
Well, I reinstalled the whole workflow and it seems to be working. I'm afraid I spent your time. Anyway, thank you for your responses and time. I will inform you, if you are interested.
Yes, please let me know if the issue persists otherwise, I'll close this issue.
No, worries, I set the wrong directories all the time, and if everything else fails, the good 'ol turning it off and on again can help.
I am really sorry, but after 14 hours the process stopped and the error is here again.
I checked .command.err
files which were generated only for the last run in the just-created copy of the workflow:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at align.ProfileAligner.adjustScoreMatrix(ProfileAligner.java:711)
at align.ProfileAligner.alignProfiles(ProfileAligner.java:183)
at align.CodingMSA.buildAlignement(CodingMSA.java:615)
at align.CodingMSA.buildProfile(CodingMSA.java:510)
at align.CodingMSA.buildAlignmentReliable(CodingMSA.java:650)
at align.CodingMSA.run(CodingMSA.java:659)
at utils.MacseMain.main(MacseMain.java:426)
fasta file:OG0006401.aligned.unsorted.cds.tmp not found
java.io.FileNotFoundException: OG0006401.aligned.unsorted.cds.tmp (No such file or directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:216)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:111)
at java.base/java.io.FileReader.<init>(FileReader.java:60)
at bioObject.CodingDnaSeq.readFasta(CodingDnaSeq.java:562)
at utils.MacseMain.main(MacseMain.java:590)
rm: cannot remove 'OG0006401*.tmp': No such file or directory
rm: cannot remove 'OG0006401.AA.prot': No such file or directory
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at align.ProfileAligner.<init>(ProfileAligner.java:120)
at align.CodingMSA.<init>(CodingMSA.java:64)
at align.CodingMSA.buildAlignmentReliable(CodingMSA.java:633)
at align.CodingMSA.run(CodingMSA.java:659)
at utils.MacseMain.main(MacseMain.java:426)
fasta file:OG0008703.aligned.unsorted.cds.tmp not found
java.io.FileNotFoundException: OG0008703.aligned.unsorted.cds.tmp (No such file or directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:216)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:111)
at java.base/java.io.FileReader.<init>(FileReader.java:60)
at bioObject.CodingDnaSeq.readFasta(CodingDnaSeq.java:562)
at utils.MacseMain.main(MacseMain.java:590)
rm: cannot remove 'OG0008703*.tmp': No such file or directory
rm: cannot remove 'OG0008703.AA.prot': No such file or directory
then ValueError: invalid mode: 'rU'
, but I have already met this error, it was caused by Python 3.11
The solution is using Python3.11 and removing the "U" from the function in input.py.
the next one
ERROR: LoadError: SystemError: opening file "CDS/OG0000762.cds": No such file or directory
Stacktrace:
[1] systemerror(p::String, errno::Int32; extrainfo::Nothing)
@ Base ./error.jl:176
[2] #systemerror#80
@ ./error.jl:175 [inlined]
[3] systemerror
@ ./error.jl:175 [inlined]
[4] open(fname::String; lock::Bool, read::Bool, write::Nothing, create::Nothing, truncate::Nothing, append::Nothing)
@ Base ./iostream.jl:293
[5] open(fname::String, mode::String; lock::Bool)
@ Base ./iostream.jl:356
[6] open(fname::String, mode::String)
@ Base ./iostream.jl:355
[7] top-level scope
@ ~/compare_my/compare_genomes/scripts/extract_sequence_using_name_query.jl:57
in expression starting at /home/saenkos/compare_my/compare_genomes/scripts/extract_sequence_using_name_query.jl:57
ls: cannot access 'OG0000762-*.fasta': No such file or directory
fasta file:OG0000762.fasta not found
java.io.FileNotFoundException: OG0000762.fasta (No such file or directory)
then, but I am not sure this is really an error
signal (15): Terminated in expression starting at /home/saenkos/compare_my/compare_genomes/config/install_julia_packages.jl:1 ijl_uncompress_ir at /usr/local/src/conda/julia-1.8.3/src/ircode.c:862 InliningTodo at ./compiler/ssair/inlining.jl:870 [inlined] resolve_todo at ./compiler/ssair/inlining.jl:804 analyze_method! at ./compiler/ssair/inlining.jl:861 handle_match! at ./compiler/ssair/inlining.jl:1293 analyze_single_call! at ./compiler/ssair/inlining.jl:1210 assemble_inline_todo! at ./compiler/ssair/inlining.jl:1425 ssa_inlining_pass! at ./compiler/ssair/inlining.jl:82 jfptr_ssa_inlining_passNOT._16094.clone_1 at /home/saenkos/anaconda3/envs/myenv/envs/compare_genomes/lib/julia/sys.so (unknown line)
Also, I get the up-to-date link to V17 Classification, because it is not a current release anymore.
http://data.pantherdb.org/ftp/sequence_classifications/17.0/PANTHER_Sequence_Classification_files/ in modules/setup.nf
So, I'm going to fix some of them and run the workflow one more time increasing CPUs and memory limits. Now it seems to be better than before.
Sound like a good plan. Yes, you're just running out of memory it seems, and the subsequent error messages are just because the previous step did not generate the cds alignments it was expecting.
So, I increased the memory size to 160GB and 64 CPUs.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at align.ProfileAligner.adjustScoreMatrix(ProfileAligner.java:711)
at align.ProfileAligner.alignProfiles(ProfileAligner.java:183)
at align.CodingMSA.buildAlignement(CodingMSA.java:615)
at align.CodingMSA.buildProfile(CodingMSA.java:510)
at align.CodingMSA.buildAlignmentReliable(CodingMSA.java:650)
at align.CodingMSA.run(CodingMSA.java:659)
at utils.MacseMain.main(MacseMain.java:426)
How is it possible? I have only 5 genomes ~160M in length each.
Did you increase the memory in process.config
accordingly? If you did, then the memory may be getting stretched thin across the cpus. Try reducing the number of cpus in process.config
to give each core more memory.
I reduced the number of cpus (24), and now diamond blastp
has been running for 50 hours. Have you ever seen such a thing?
Yes, some of the plant genomes I've dealt with took more than a week to finish the whole workflow. Are you at assess_specific_genes.nf
?
I'm on
executor > local (2)
[09/f8b172] process > FIND_ORTHOGROUPS [100%] 1 of 1 ✔
[b5/27007a] process > ASSIGN_GENE_FAMILIES_TO_ORT... [ 0%] 0 of 1
[- ] process > ASSESS_ORTHOGROUPS_DISTRIBU... -
File orthogroups.faa
was created 8 hours ago. But, unfortunately, I should restart the workflow because our Slurm management system on the HPC provides only 72 hours.
You can run each module separately and you may even go into each module and extract portions of the shell scripts so you can run them individually. That should give you an even more finer control over the whole workflow and should allow you to work around the 75 hour max run time in you HPC.
Well, I am still trying to move on: Before I got the error:
ERROR: LoadError: SystemError: opening file "CDS/putative.cds": No such file or directory
Have you got such an error from iqtree2
?
IQ-TREE multicore version 2.2.0.3 COVID-edition for Linux 64-bit built Aug 2 2022
Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung,
Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams, Ly Trong Nhan.
Host: node362 (AVX512, FMA3, 187 GB RAM)
Command: iqtree2 -s ORTHOGROUPS_SINGLE_GENE.NT.aln -p alignment_parition.NT.nex -T 20 --date /home/saenkos/compare_my/compare_genomes/modules/../config/dates.txt --date-tip 0 --prefix ORTHOGROUPS_SINGLE_GENE.NT --redo
Seed: 886971 (Using SPRNG - Scalable Parallel Random Number Generator)
Time: Wed Nov 1 08:23:09 2023
Kernel: AVX+FMA - 20 threads (32 CPU cores detected)
Reading partition model file alignment_parition.NT.nex ...
Reading alignment file ORTHOGROUPS_SINGLE_GENE.NT.aln ... Fasta format detected
Reading fasta file: done in 1.75486 secs using 91.45% CPU
ERROR: Sequence Tribolium_castaneum contains too many characters (16349019)
ERROR: Sequence Tribolium_madens contains too many characters (108930093)
ERROR:
For some reason the alignments does not seem to have the same lengths. Can you look at the alignment file? Maybe they have been concatenated twice during the course of the retries. That maybe something we can fix/add to account for multiple failing reruns.
By the way, I have ran coleopteran genomes in the past including Tribolium castaneum and can confirm that the genome at least the NCBI genome and predicted proteins should not give us any problems.
Could you please share the initial *.txt
files?
My email is saenkos@uni-greifswald.de
I'll send the config files to you as soon as I get access to the VM I ran it on. But for now, I'm running the workflow on one of my VMs using the config files you've previously sent.
I think I found the issue with plotting, and have committed the fix. Although I'm running the entire workflow from the beginning to validate that it works. The issue is with the names of sequences in the CDS of Tribolium castaneum where we have duplicated sequence names and therefore getting two sequences when we expect one which prevents IQTREE from building a tree as the alignments across species do not match. Also note that I have made changes with params.config
where I have deharcoded the PantherHMM classifications text file, i.e. I've added panther_hmm_classifications_location = 'http://data.pantherdb.org/ftp/hmm_classifications/17.0/PANTHER17.0_HMM_classifications'
to make things simpler when changing Panther versions. I'll let you know if the fix succeeds, then I'll send you the config files I used.
Validated the fix on my end (32-core machine with 120 GB RAM which ran for ~5 hours and 14 minutes). Please see the Coleopterans branch for the config files, you may also just clone that branch. Please let me know if it worked for you too.
Here's the summary figure I got:
This is wonderful, thank you very much. And you did not have any Java out-of-memory error?
I had no memory issues. Maybe the shared computing/submission system you have has some quirks with how memory is managed with java. I hope it'll be smooth this time around. Good luck with your analyses. Very interesting patterns of contraction/expansion of gene families!
I am really sorry, but after 14 hours the process stopped and the error is here again.
I checked
.command.err
files which were generated only for the last run in the just-created copy of the workflow:Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at align.ProfileAligner.adjustScoreMatrix(ProfileAligner.java:711) at align.ProfileAligner.alignProfiles(ProfileAligner.java:183) at align.CodingMSA.buildAlignement(CodingMSA.java:615) at align.CodingMSA.buildProfile(CodingMSA.java:510) at align.CodingMSA.buildAlignmentReliable(CodingMSA.java:650) at align.CodingMSA.run(CodingMSA.java:659) at utils.MacseMain.main(MacseMain.java:426) fasta file:OG0006401.aligned.unsorted.cds.tmp not found java.io.FileNotFoundException: OG0006401.aligned.unsorted.cds.tmp (No such file or directory) at java.base/java.io.FileInputStream.open0(Native Method) at java.base/java.io.FileInputStream.open(FileInputStream.java:216) at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157) at java.base/java.io.FileInputStream.<init>(FileInputStream.java:111) at java.base/java.io.FileReader.<init>(FileReader.java:60) at bioObject.CodingDnaSeq.readFasta(CodingDnaSeq.java:562) at utils.MacseMain.main(MacseMain.java:590) rm: cannot remove 'OG0006401*.tmp': No such file or directory rm: cannot remove 'OG0006401.AA.prot': No such file or directory Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at align.ProfileAligner.<init>(ProfileAligner.java:120) at align.CodingMSA.<init>(CodingMSA.java:64) at align.CodingMSA.buildAlignmentReliable(CodingMSA.java:633) at align.CodingMSA.run(CodingMSA.java:659) at utils.MacseMain.main(MacseMain.java:426) fasta file:OG0008703.aligned.unsorted.cds.tmp not found java.io.FileNotFoundException: OG0008703.aligned.unsorted.cds.tmp (No such file or directory) at java.base/java.io.FileInputStream.open0(Native Method) at java.base/java.io.FileInputStream.open(FileInputStream.java:216) at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157) at java.base/java.io.FileInputStream.<init>(FileInputStream.java:111) at java.base/java.io.FileReader.<init>(FileReader.java:60) at bioObject.CodingDnaSeq.readFasta(CodingDnaSeq.java:562) at utils.MacseMain.main(MacseMain.java:590) rm: cannot remove 'OG0008703*.tmp': No such file or directory rm: cannot remove 'OG0008703.AA.prot': No such file or directory
then
ValueError: invalid mode: 'rU'
, but I have already met this error, it was caused by Python 3.11The solution is using Python3.11 and removing the "U" from the function in input.py.
the next one
ERROR: LoadError: SystemError: opening file "CDS/OG0000762.cds": No such file or directory Stacktrace: [1] systemerror(p::String, errno::Int32; extrainfo::Nothing) @ Base ./error.jl:176 [2] #systemerror#80 @ ./error.jl:175 [inlined] [3] systemerror @ ./error.jl:175 [inlined] [4] open(fname::String; lock::Bool, read::Bool, write::Nothing, create::Nothing, truncate::Nothing, append::Nothing) @ Base ./iostream.jl:293 [5] open(fname::String, mode::String; lock::Bool) @ Base ./iostream.jl:356 [6] open(fname::String, mode::String) @ Base ./iostream.jl:355 [7] top-level scope @ ~/compare_my/compare_genomes/scripts/extract_sequence_using_name_query.jl:57 in expression starting at /home/saenkos/compare_my/compare_genomes/scripts/extract_sequence_using_name_query.jl:57 ls: cannot access 'OG0000762-*.fasta': No such file or directory fasta file:OG0000762.fasta not found java.io.FileNotFoundException: OG0000762.fasta (No such file or directory)
then, but I am not sure this is really an error
signal (15): Terminated in expression starting at /home/saenkos/compare_my/compare_genomes/config/install_julia_packages.jl:1 ijl_uncompress_ir at /usr/local/src/conda/julia-1.8.3/src/ircode.c:862 InliningTodo at ./compiler/ssair/inlining.jl:870 [inlined] resolve_todo at ./compiler/ssair/inlining.jl:804 analyze_method! at ./compiler/ssair/inlining.jl:861 handle_match! at ./compiler/ssair/inlining.jl:1293 analyze_single_call! at ./compiler/ssair/inlining.jl:1210 assemble_inline_todo! at ./compiler/ssair/inlining.jl:1425 ssa_inlining_pass! at ./compiler/ssair/inlining.jl:82 jfptr_ssa_inlining_passNOT._16094.clone_1 at /home/saenkos/anaconda3/envs/myenv/envs/compare_genomes/lib/julia/sys.so (unknown line)
Also, I get the up-to-date link to V17 Classification, because it is not a current release anymore.http://data.pantherdb.org/ftp/sequence_classifications/17.0/PANTHER_Sequence_Classification_files/ in
modules/setup.nf
So, I'm going to fix some of them and run the workflow one more time increasing CPUs and memory limits. Now it seems to be better than before.
I have similar error. I find my result is different from TEST in “Orthogroups.tsv”. my result: OG0002194 Dryobates_pubescens|XP_054020272.1 transmembrane protein 245 isoform X1 [Dryobates pubescens], Dryobates_pubescens|XP_054020273.1 transmembrane protein 245 isoform X2 [Dryobates pubescens], Dryobates_pubescens|XP_054020274.1 transmembrane protein 245 isoform X3 [Dryobates pubescens] Indicator_indicator|XP_054237876.1 transmembrane protein 245 [Indicator indicator] Melanerpes_aurifrons|TMEM245_rna-XM_015282350.2.3676, Melanerpes_aurifrons|TMEM245_rna-XM_015282351.2.3676, Melanerpes_aurifrons|TMEM245_rna-XM_015282352.2.3676 Upupaepops|NWU95202.1 TM245 protein partial [Upupa epops]
TEST: OG0017327 Arabidopsis_arenosa|CAE6190620.1 Arabidopsis_lyrata|XP_020875211.1 Arabidopsis_suecica|KAG7547138.1, Arabidopsis_suecica|KAG7620870.1 Arabidopsis_thaliana|NP_567536.1
my result's header contains protein name. But the source file format is the same.
my:
XP_009894245.2 pyroglutamylated RF-amide peptide receptor [Dryobates pubescens] MRSLNITPEQFAQLLRDNNVTREQFIALYGLQPLVYIPELPGRTKVAFVLICVLIFVLALFGNCLVLYVVTRSKAMRTVT NIFICSLALSDLLIAFFCVPFTMLQNISSNWLGGAFACKMVPFVQSTAIVTEILTMTCIAVERHQGIVHPLKMKWQYTNK RAFTMLGIVWLLALIVGSPMWHVQRLEVKYDFLYEKVYVCCLEEWASPIYQKIYTTFILVILFLLPLMLMLFLYTKIGYE LWIKKRVGDASVLQTIHGSEMSKISRKKKRAIVMMVTVVFLFAVCWAPFHVIHMMIEYSNFEKEYDDVTVKMIFAIVQII GFFNSICNPIVYAFMNENFKKNFLSAICFCIVKENSSPARHLGNLGITLRRQKAASQRDPVDSDEGRREAFSDGNIEVKF CDQPSSKRHLKRHLALFSSELTVHSALGNGQ TEST: KAG7527760.1 hypothetical protein ISN44_Un269g000010, partial [Arabidopsis suecica] IEDFVKEYHEAKDTPKDQNLKRPRQSNEEEPRSSKGKINVIIGGSKLCRDTINAIKKHRRNVLFKANLGEEMDFQGTSIS FDEEETCHLERPHDDALVITLDVANFEVSRILVDTGSSVDLIFLGTLERMGISRADIVGPPTPLVAFTSESAMSLGTIKL PVLAKNVSKIVDFVVFDKPAAYNIILGTPWIYQMKAVPSTYHQCIKFPTPSGVGTIRGSQEASRT
I would like to know if this problem originates from orthofinder or caompare_genomes and then how to solve it?
I am really sorry, but after 14 hours the process stopped and the error is here again. I checked
.command.err
files which were generated only for the last run in the just-created copy of the workflow:Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at align.ProfileAligner.adjustScoreMatrix(ProfileAligner.java:711) at align.ProfileAligner.alignProfiles(ProfileAligner.java:183) at align.CodingMSA.buildAlignement(CodingMSA.java:615) at align.CodingMSA.buildProfile(CodingMSA.java:510) at align.CodingMSA.buildAlignmentReliable(CodingMSA.java:650) at align.CodingMSA.run(CodingMSA.java:659) at utils.MacseMain.main(MacseMain.java:426) fasta file:OG0006401.aligned.unsorted.cds.tmp not found java.io.FileNotFoundException: OG0006401.aligned.unsorted.cds.tmp (No such file or directory) at java.base/java.io.FileInputStream.open0(Native Method) at java.base/java.io.FileInputStream.open(FileInputStream.java:216) at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157) at java.base/java.io.FileInputStream.<init>(FileInputStream.java:111) at java.base/java.io.FileReader.<init>(FileReader.java:60) at bioObject.CodingDnaSeq.readFasta(CodingDnaSeq.java:562) at utils.MacseMain.main(MacseMain.java:590) rm: cannot remove 'OG0006401*.tmp': No such file or directory rm: cannot remove 'OG0006401.AA.prot': No such file or directory Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at align.ProfileAligner.<init>(ProfileAligner.java:120) at align.CodingMSA.<init>(CodingMSA.java:64) at align.CodingMSA.buildAlignmentReliable(CodingMSA.java:633) at align.CodingMSA.run(CodingMSA.java:659) at utils.MacseMain.main(MacseMain.java:426) fasta file:OG0008703.aligned.unsorted.cds.tmp not found java.io.FileNotFoundException: OG0008703.aligned.unsorted.cds.tmp (No such file or directory) at java.base/java.io.FileInputStream.open0(Native Method) at java.base/java.io.FileInputStream.open(FileInputStream.java:216) at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157) at java.base/java.io.FileInputStream.<init>(FileInputStream.java:111) at java.base/java.io.FileReader.<init>(FileReader.java:60) at bioObject.CodingDnaSeq.readFasta(CodingDnaSeq.java:562) at utils.MacseMain.main(MacseMain.java:590) rm: cannot remove 'OG0008703*.tmp': No such file or directory rm: cannot remove 'OG0008703.AA.prot': No such file or directory
then
ValueError: invalid mode: 'rU'
, but I have already met this error, it was caused by Python 3.11 The solution is using Python3.11 and removing the "U" from the function in input.py. the next oneERROR: LoadError: SystemError: opening file "CDS/OG0000762.cds": No such file or directory Stacktrace: [1] systemerror(p::String, errno::Int32; extrainfo::Nothing) @ Base ./error.jl:176 [2] #systemerror#80 @ ./error.jl:175 [inlined] [3] systemerror @ ./error.jl:175 [inlined] [4] open(fname::String; lock::Bool, read::Bool, write::Nothing, create::Nothing, truncate::Nothing, append::Nothing) @ Base ./iostream.jl:293 [5] open(fname::String, mode::String; lock::Bool) @ Base ./iostream.jl:356 [6] open(fname::String, mode::String) @ Base ./iostream.jl:355 [7] top-level scope @ ~/compare_my/compare_genomes/scripts/extract_sequence_using_name_query.jl:57 in expression starting at /home/saenkos/compare_my/compare_genomes/scripts/extract_sequence_using_name_query.jl:57 ls: cannot access 'OG0000762-*.fasta': No such file or directory fasta file:OG0000762.fasta not found java.io.FileNotFoundException: OG0000762.fasta (No such file or directory)
then, but I am not sure this is really an error
signal (15): Terminated in expression starting at /home/saenkos/compare_my/compare_genomes/config/install_julia_packages.jl:1 ijl_uncompress_ir at /usr/local/src/conda/julia-1.8.3/src/ircode.c:862 InliningTodo at ./compiler/ssair/inlining.jl:870 [inlined] resolve_todo at ./compiler/ssair/inlining.jl:804 analyze_method! at ./compiler/ssair/inlining.jl:861 handle_match! at ./compiler/ssair/inlining.jl:1293 analyze_single_call! at ./compiler/ssair/inlining.jl:1210 assemble_inline_todo! at ./compiler/ssair/inlining.jl:1425 ssa_inlining_pass! at ./compiler/ssair/inlining.jl:82 jfptr_ssa_inlining_passNOT._16094.clone_1 at /home/saenkos/anaconda3/envs/myenv/envs/compare_genomes/lib/julia/sys.so (unknown line)
Also, I get the up-to-date link to V17 Classification, because it is not a current release anymore. http://data.pantherdb.org/ftp/sequence_classifications/17.0/PANTHER_Sequence_Classification_files/ inmodules/setup.nf
So, I'm going to fix some of them and run the workflow one more time increasing CPUs and memory limits. Now it seems to be better than before.I have similar error. I find my result is different from TEST in “Orthogroups.tsv”. my result: OG0002194 Dryobates_pubescens|XP_054020272.1 transmembrane protein 245 isoform X1 [Dryobates pubescens], Dryobates_pubescens|XP_054020273.1 transmembrane protein 245 isoform X2 [Dryobates pubescens], Dryobates_pubescens|XP_054020274.1 transmembrane protein 245 isoform X3 [Dryobates pubescens] Indicator_indicator|XP_054237876.1 transmembrane protein 245 [Indicator indicator] Melanerpes_aurifrons|TMEM245_rna-XM_015282350.2.3676, Melanerpes_aurifrons|TMEM245_rna-XM_015282351.2.3676, Melanerpes_aurifrons|TMEM245_rna-XM_015282352.2.3676 Upupaepops|NWU95202.1 TM245 protein partial [Upupa epops]
TEST: OG0017327 Arabidopsis_arenosa|CAE6190620.1 Arabidopsis_lyrata|XP_020875211.1 Arabidopsis_suecica|KAG7547138.1, Arabidopsis_suecica|KAG7620870.1 Arabidopsis_thaliana|NP_567536.1
my result's header contains protein name. But the source file format is the same.
my:
XP_009894245.2 pyroglutamylated RF-amide peptide receptor [Dryobates pubescens] MRSLNITPEQFAQLLRDNNVTREQFIALYGLQPLVYIPELPGRTKVAFVLICVLIFVLALFGNCLVLYVVTRSKAMRTVT NIFICSLALSDLLIAFFCVPFTMLQNISSNWLGGAFACKMVPFVQSTAIVTEILTMTCIAVERHQGIVHPLKMKWQYTNK RAFTMLGIVWLLALIVGSPMWHVQRLEVKYDFLYEKVYVCCLEEWASPIYQKIYTTFILVILFLLPLMLMLFLYTKIGYE LWIKKRVGDASVLQTIHGSEMSKISRKKKRAIVMMVTVVFLFAVCWAPFHVIHMMIEYSNFEKEYDDVTVKMIFAIVQII GFFNSICNPIVYAFMNENFKKNFLSAICFCIVKENSSPARHLGNLGITLRRQKAASQRDPVDSDEGRREAFSDGNIEVKF CDQPSSKRHLKRHLALFSSELTVHSALGNGQ TEST: KAG7527760.1 hypothetical protein ISN44_Un269g000010, partial [Arabidopsis suecica] IEDFVKEYHEAKDTPKDQNLKRPRQSNEEEPRSSKGKINVIIGGSKLCRDTINAIKKHRRNVLFKANLGEEMDFQGTSIS FDEEETCHLERPHDDALVITLDVANFEVSRILVDTGSSVDLIFLGTLERMGISRADIVGPPTPLVAFTSESAMSLGTIKL PVLAKNVSKIVDFVVFDKPAAYNIILGTPWIYQMKAVPSTYHQCIKFPTPSGVGTIRGSQEASRT
I would like to know if this problem originates from orthofinder or caompare_genomes and then how to solve it?
@jeffersonfparil
Hello! I tried to run compare_genomes on several species, got an error
Error in file(file, "r") : cannot open the connection Calls: read.nexus -> scan -> file In addition: Warning message: In file(file, "r") : cannot open file 'ORTHOGROUPS_SINGLE_GENE.NT.timetree.nex': No such file or directory Execution halted
Also, most of the output files (e.g. expanded_orthogroup) are empty. Is there something wrong with my files?
P.S. Panther17 is no longer available, will it be better to change from v17 to v18 everywhere?
Thank you.