liberjul / CONSTAXv2

MIT License
8 stars 2 forks source link

genus_wordConditionalProbList.txt not found #8

Closed noelzach closed 2 years ago

noelzach commented 2 years ago

Hey Julian,

I've been working with the Alabama Supercomputer Authority IT folks and keep running into an error while running CONSTAX on the Alabama HPC.

There are two things.

  1. First seems to be related to Python, but CONSTAX runs after this error and gets through training the databases.

Fatal Python error: Py_Initialize: can't initialize sys standard streams Traceback (most recent call last): File "/opt/asn/apps/anaconda_3-2021.11/lib/python3.9/abc.py", line 85, in ModuleNotFoundError: No module named '_abc'

  1. Then there seems to be an error with RDP not being able to find genus_wordConditionalProbLists.txt and fails to classify anything.

Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: /scratch/aubzxn001/Pecan/2021/Fungi/workflow/scripts/training_files/genus_wordConditionalProbList.txt (No such file or directory) at edu.msu.cme.rdp.multicompare.MultiClassifier.(MultiClassifier.java:66) at edu.msu.cme.rdp.multicompare.MultiClassifier.(MultiClassifier.java:75) at edu.msu.cme.rdp.multicompare.Main.main(Main.java:247) at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:67) Caused by: java.io.FileNotFoundException: /scratch/aubzxn001/Pecan/2021/Fungi/workflow/scripts/training_files/genus_wordConditionalProbList.txt (No such file or directory)

Not sure if one error is related to the other.

I have attached the full log file and my script.

Hoping you can help resolve this so I can get this running for my students.

Thanks, hope all is well!

Zach

logTaxonomy.txt taxonomy_script.txt

liberjul commented 2 years ago

Hi Zach,

Thanks for reaching out and trying to get CONSTAX2 operational! I haven't seen thise error before, so we'll have to test some things.

1) What is your python version being executed? python -V should show it. I think that this error may be caused a version mismatch between the modules and the python command:

Fatal Python error: Py_Initialize: can't initialize sys standard streams
Traceback (most recent call last):
  File "/opt/asn/apps/anaconda_3-2021.11/lib/python3.9/abc.py", line 85, in <module>
ModuleNotFoundError: No module named '_abc'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/asn/apps/anaconda_3-2021.11/lib/python3.9/io.py", line 52, in <module>
  File "/opt/asn/apps/anaconda_3-2021.11/lib/python3.9/abc.py", line 89, in <module>
  File "/opt/asn/apps/anaconda_3-2021.11/lib/python3.9/_py_abc.py", line 35
    def __new__(mcls, name, bases, namespace, /, **kwargs):
                                              ^
SyntaxError: invalid syntax

2) Is there actually a file called /scratch/aubzxn001/Pecan/2021/Fungi/workflow/scripts/training_files/genus_wordConditionalProbList.txt? RDP had an initial training error, but then appeared to work after correcting for duplicate taxa. If that file does exist, I think the first error may be causing the second one

3) There should also be a file called log_constax2_<timestamp>.txt, so please attach that when responding.

Thanks,

Julian

noelzach commented 2 years ago

There actually is no file called genus_wordConditionalProbList.txt located in the training_files directory. Maybe it is being written to a different directory? Because it appears to have trained everything correctly...

I also cannot locate the log_constax2 file in any of my directories.

I am running python 3.9.7

liberjul commented 2 years ago

Hi Zach, One test to try fixing this is by running RDP independently of the CONSTAX installation. If you installed rdptools on path (as it appears you did in your logfile), this command should work to make the RDP training files.

classifier train -o /scratch/aubzxn001/Pecan/2021/Fungi/workflow/scripts/training_files/. \
-s /scratch/aubzxn001/Pecan/2021/Fungi/workflow/scripts/training_files/sh_general_release_dynamic_04.02.2020__RDP_trained.fasta \
-t /scratch/aubzxn001/Pecan/2021/Fungi/workflow/scripts/training_files/sh_general_release_dynamic_04.02.2020__RDP_taxonomy_trained.txt -Xmx32g

Let me know how that goes,

Julian

liberjul commented 2 years ago

Hi Zach, I believe that the RDP issue is being caused by the python one. I was able tot reproduce the error, and I fixed it by removing the line env["PYTHONPATH"] = ':'.join(sys.path[1:]) in constax_wrapper.py. I am currently pushing an update to fix this. If you want to use before then, do the following:

1) Remove the env["PYTHONPATH"] = ':'.join(sys.path[1:]) line from constax_wrapper.py located at /opt/asn/apps/anaconda_3-2021.11/opt/constax-2.0.17-0/constax_wrapper.py

2) Change the constax executable:

ln -s -f /opt/asn/apps/anaconda_3-2021.11/opt/constax-2.0.17-0/constax_wrapper.py $(which constax)

3) Rerun!

liberjul commented 2 years ago

Hi Zach,

I was able to get this fixed in v2.0.18, so an update to this version should fix the issue.

noelzach commented 5 months ago

Hi Julian,

It's been a while since I've tried running CONSTAX on our Alabama Supercomputer Authority HPC system. Long story short, it is working now. Firstly, I had to point to the CONSTAX directory within my script. I'm not sure why I had to do this, but it's probably something to do with the way it was installed. Not a big deal. Second, the Python issue has been solved with the update. Thank you. Third, the second problem of not finding the genus_wordConditionalProbList.txt was also solved. It threw this error because Java ran out of memory with some of the newer, larger UNITE databases. When training through CONSTAX, the error message was cryptic because it said it could not find the files, but the real issue was that the rdp tools training step was failing due to memory issues (heap size). When I trained independently outside CONSTAX using rdp tools and used 150GB of memory for training with the v9 Unite Eukaryote database, it finished adequately and generated the appropriate files. Then I could use CONSTAX without errors. I know this is an old and closed issue, but I finally had time to figure it out and wanted to update anyone with similar problems.

liberjul commented 5 months ago

Hi Zach,

Thank you for doing all the work to figure this out. I hope it continues to work well for you and I'm happy to troubleshoot if it doesn't!

Thanks,

Julian