ksahlin / NGSpeciesID

Reference-free clustering and consensus forming of long-read amplicon sequencing
GNU General Public License v3.0
52 stars 16 forks source link

problem with medakka polishing #21

Open jhgffhgjf opened 2 years ago

jhgffhgjf commented 2 years ago

every time my instance of NGSspeciesID runs, it will never finish the medakka polishing 2 consensus formed. Saving spoa references to files: 945021|NC_016052.1/consensus_reference_X.fasta running medaka on spoa reference 3 using 18 reads for polishing. creating 945021|NC_016052.1/medaka_cl_id_3 Traceback (most recent call last): File "/home/marc/anaconda3/envs/ngspeciesid/bin/NGSpeciesID", line 294, in main(args) File "/home/marc/anaconda3/envs/ngspeciesid/bin/NGSpeciesID", line 145, in main centers_polished = consensus.polish_sequences(centers_filtered, args) File "/home/marc/anaconda3/envs/ngspeciesid/lib/python3.6/site-packages/modules/consensus.py", line 224, in polish_sequences run_medaka(all_reads_file, spoa_center_file, polishing_outfolder, "1", args.medaka_model) File "/home/marc/anaconda3/envs/ngspeciesid/lib/python3.6/site-packages/modules/consensus.py", line 109, in run_medaka subprocess.check_call(['medaka_consensus', '-i', reads_to_center, "-d", center_file, "-o", outfolder, "-t", cores], stdout=output_file, stderr=medaka_stderr) File "/home/marc/anaconda3/envs/ngspeciesid/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['medaka_consensus', '-i', '945021|NC_016052.1/reads_to_consensus_3.fastq', '-d', '945021|NC_016052.1/consensus_reference_3.fasta', '-o', '945021|NC_016052.1/medaka_cl_id_3', '-t', '1']' returned non-zero exit status 1.

I did notice in your other thread about the medakka problems that in the last line there is a difference in this part: raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['medaka_consensus', '-i', '945021|NC_016052.1/reads_to_consensus_3.fastq', '-d', '945021|NC_016052.1/consensus_reference_3.fasta', '-o', '945021|NC_016052.1/medaka_cl_id_3', '-t', '1']' returned non-zero exit status 1.

whilst in a past thread of yours: subprocess.CalledProcessError: Command '['medaka_consensus', '-i', './ngspouts_liverbiduck/reads_to_consensus_1.fasta', '-d', './ngspouts_liverbiduck/consensus_reference_1.fasta', '-o', './ngspouts_liverbiduck/medaka_cl_id_1', '-t', '1', '-m', 'r941_min_high_g330']' returned non-zero exit status 1.

the "-m', 'r941_min_high_g330']'" part is missing, but i have no idea if this is normal or not.

Thanks in advance! -Marc

ksahlin commented 2 years ago

Hi @jhgffhgjf ,

Could you please run

medaka_consensus -i 945021|NC_016052.1/reads_to_consensus_3.fastq -d 945021|NC_016052.1/consensus_reference_3.fasta -o 945021|NC_016052.1/medaka_cl_id_3 -t 1

directly from the terminal to identify whether the error is something with installation or input?

As you may know, you may also rpecify --racon instead of --medaka to use a different polishing algorithm.

mtva0001 commented 2 years ago

Hi,

I have the same issue specifying medaka (1.5.0): NGSpeciesID --ont --fastq sample_h1.fastq --outfolder ./sample_h1 --consensus --medaka Error: Saving spoa references to files: ./sample_h1/consensus_reference_X.fasta running medaka on spoa reference 17 using 256 reads for polishing. creating ./sample_h1/medaka_cl_id_17 Traceback (most recent call last): File "/opt/anaconda3/envs/NGSpeciesID/bin/NGSpeciesID", line 291, in <module> main(args) File "/opt/anaconda3/envs/NGSpeciesID/bin/NGSpeciesID", line 142, in main centers_polished = consensus.polish_sequences(centers_filtered, args) File "/opt/anaconda3/envs/NGSpeciesID/lib/python3.6/site-packages/modules/consensus.py", line 224, in polish_sequences run_medaka(all_reads_file, spoa_center_file, polishing_outfolder, "1", args.medaka_model) File "/opt/anaconda3/envs/NGSpeciesID/lib/python3.6/site-packages/modules/consensus.py", line 109, in run_medaka subprocess.check_call(['medaka_consensus', '-i', reads_to_center, "-d", center_file, "-o", outfolder, "-t", cores], stdout=output_file, stderr=medaka_stderr) File "/opt/anaconda3/envs/NGSpeciesID/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['medaka_consensus', '-i', './sample_h1/reads_to_consensus_17.fastq', '-d', './sample_h1/consensus_reference_17.fasta', '-o', './sample_h1/medaka_cl_id_17', '-t', '1']' returned non-zero exit status 1.

I tried to run the above-mentioned command: medaka_consensus -i ./sample_h1/reads_to_consensus_17.fastq -d ./sample_h1/consensus_reference_17.fasta -o ./sample_h1/medaka_cl_id_17 -t 1

and I got this: readlink: illegal option -- f usage: readlink [-n] [file ...]

I'd prefer to use medaka for my new experiment, but don't know how I could make it work. Do you have any new idea? I attach the list of packages installed.

Thanks for your response in advance! Best, Máté

condalist.txt

ksahlin commented 2 years ago

Hi @mtva0001 ,

This error is discussed in this comment https://github.com/ksahlin/NGSpeciesID/issues/1#issuecomment-605871762 (older medaka version), and the solution is presented in the same post (or in the post after by another user).

Best, Kristoffer

mtva0001 commented 2 years ago

I re-installed NGSpeciesID following: https://github.com/ksahlin/NGSpeciesID/issues/1#issuecomment-644698902 So the medaka version is 1.0.3 right now, and I got this error message: Saving spoa references to files: ./sample_h1/consensus_reference_X.fasta running medaka on spoa reference 17 using 256 reads for polishing. creating ./sample_h1/medaka_cl_id_17 Traceback (most recent call last): File "/opt/anaconda3/envs/ngspeciesid/bin/NGSpeciesID", line 294, in <module> main(args) File "/opt/anaconda3/envs/ngspeciesid/bin/NGSpeciesID", line 145, in main centers_polished = consensus.polish_sequences(centers_filtered, args) File "/opt/anaconda3/envs/ngspeciesid/lib/python3.6/site-packages/modules/consensus.py", line 224, in polish_sequences run_medaka(all_reads_file, spoa_center_file, polishing_outfolder, "1", args.medaka_model) File "/opt/anaconda3/envs/ngspeciesid/lib/python3.6/site-packages/modules/consensus.py", line 109, in run_medaka subprocess.check_call(['medaka_consensus', '-i', reads_to_center, "-d", center_file, "-o", outfolder, "-t", cores], stdout=output_file, stderr=medaka_stderr) File "/opt/anaconda3/envs/ngspeciesid/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['medaka_consensus', '-i', './sample_h1/reads_to_consensus_17.fastq', '-d', './sample_h1/consensus_reference_17.fasta', '-o', './sample_h1/medaka_cl_id_17', '-t', '1']' returned non-zero exit status 2.

And after running medaka (medaka_consensus -i ./sample_h1/reads_to_consensus_17.fastq -d ./sample_h1/consensus_reference_17.fasta -o ./sample_h1/medaka_cl_id_17 -t 1):

/opt/anaconda3/envs/ngspeciesid/bin/medaka_consensus: line 16: 91633 Illegal instruction: 4 medaka tools list_models Checking program versions This is medaka 1.0.3 Program Version Required Pass bcftools 1.10.2 1.9 True bgzip 1.10.2 1.9 True minimap2 2.24 2.11 True samtools 1.10 1.9 True tabix 1.10.2 1.9 True Warning: Output ./sample_h1/medaka_cl_id_17 already exists, may use old results. usage: medaka tools is_rle_model [-h] [--model MODEL] [--disable_cudnn] medaka tools is_rle_model: error: argument --model: expected 1 argument

mtva0001 commented 2 years ago

Ah okay, now I specified the model: -m r941_min_high_g330. It seems working. But could it be specified somehow already in the NGSpeciesID run?

ksahlin commented 2 years ago

Great, thanks for reporting back. Yes, that sounds reasonable - will do.

mtva0001 commented 2 years ago

Oh no, I was wrong, it didn't work: /opt/anaconda3/envs/ngspeciesid/bin/medaka_consensus: line 16: 93427 Illegal instruction: 4 medaka tools list_models There's some issues with the model but I couldn't figure out how to get a list of models because the command suggested on the medaka site does not work either (medaka tools list_models). Also, they have medaka 1.6.x now, so would be great to use the newest version in NGSpeciesID, if it is possible.

jhgffhgjf commented 2 years ago

I resolved the issue by making sure the path to the working directory did not contain any spaces or symbols. Rookie ubuntu user mistake ;)

On Wed, Aug 10, 2022 at 11:10 AM mtva0001 @.***> wrote:

Oh no, I was wrong, it didn't work: /opt/anaconda3/envs/ngspeciesid/bin/medaka_consensus: line 16: 93427 Illegal instruction: 4 medaka tools list_models There's some issues with the model but I couldn't figure out how to get a list of models because the command suggested on the medaka site does not work either (medaka tools list_models). Also, they have medaka 1.6.x now, so would be great to use the newest version in NGSpeciesID, if it is possible.

— Reply to this email directly, view it on GitHub https://github.com/ksahlin/NGSpeciesID/issues/21#issuecomment-1210387302, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQ7357T4X4TDFRYVWEIQY3VYNWYTANCNFSM5UC4BGFQ . You are receiving this because you were mentioned.Message ID: @.***>

ksahlin commented 2 years ago

I googled "line 16: 93427 Illegal instruction: 4 medaka tools list_models" and I found this issue https://github.com/nanoporetech/medaka/issues/261 and this https://github.com/nanoporetech/medaka/issues/119

The first issue is even for medaka through NGSpeciesID. Perhaps they are relevant?

mtva0001 commented 2 years ago

Thanks! Yes, I saw all these comments and tried to install medaka in many different ways but did not fix the problem. I tried running the test fastq file on our server (HPC2N) since the NGSpeciesID (v.0.1.2.2) is installed there, but medaka produces the same error there, too. :(

mpnelsen commented 3 months ago

Hi All,

Hi I'm having a similar problem with running the sample dataset/tutorial, and I was hoping you might be able to help (please!).

I installed using: https://github.com/ksahlin/NGSpeciesID/issues/1#issuecomment-644698902 conda create -n ngspeciesid -c conda-forge -c bioconda python=3.6 pip "parasail-python>=1.1.10" "edlib>=1.1.2" python-edlib "medaka>=1.0.2" spoa racon minimap2 mmseqs2

conda activate ngspeciesid pip install NGSpeciesID

and then installed older versions of biopython and pysam to make compatible pip install biopython==1.77 pip install pysam==0.16.0.1 pip install NGSpeciesID

I next followed the NGSpeciesID tutorial downloading the sample dataset: mkdir test_ngspeciesID cd test_ngspeciesID curl -LO https://raw.githubusercontent.com/ksahlin/NGSpeciesID/master/test/sample_h1.fastq

and then ran the command: NGSpeciesID --ont --fastq sample_h1.fastq --outfolder ./sample_h1 --consensus --medaka . . . Time elapesd joining clusters: 0.0004467964172363281 Nr clusters larger than 1: 5 Nr clusters (all): 23 Batches after pairwise consecutive merge: 1 Using total nucleotide batch sizes: [7730] Using nr reads batch sizes: [23]

ITERATION 4 Using 1 batches. Saved: 4 iterations. Iteration NrClusters MinDbSize CurrReadId ClusterSizes Total number of reads iterated through:23 Passed mapping criteria:1 Passed alignment criteria in this process:2 Total calls to alignment module in this process:2 Time elapesd clustering last iteration single core: 0.010623455047607422 Time elapsed clustering: 2.215029239654541 Nr clusters larger than 1: 2 Nr clusters (all): 20

STARTING TO CREATE CLUSTER CONSENSUS

Temporary workdirektory for consensus and polishing: /tmp/tmpztdu1rai Forming draft consensus with abundance_cutoff >= 27 (10.0% of 274 reads) creating center of 141 sequences. creating center of 115 sequences. 18 singletons were discarded 0 clusters were discarded due to not passing the abundance_cutoff: a total of 0 reads were discarded. Highest abundance among them: 0 reads. 2 centers formed Rec comp orientation identity %: 0.9544117647058824 Forward orientation identity %: 0.45539906103286387 Detected two consensus sequences with alignment identidy above threshold (from either reverse complement or split clusters). Keeping center with the most read support and merging reads. has already been merged, skipping 1 consensus formed. Saving spoa references to files: ./sample_h1/consensus_reference_X.fasta running medaka on spoa reference 17 using 256 reads for polishing. creating ./sample_h1/medaka_cl_id_17 Traceback (most recent call last): File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/bin/NGSpeciesID", line 294, in main(args) File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/bin/NGSpeciesID", line 145, in main centers_polished = consensus.polish_sequences(centers_filtered, args) File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/lib/python3.6/site-packages/modules/consensus.py", line 234, in polish_sequences run_medaka(all_reads_file, spoa_center_file, polishing_outfolder, "1", args.medaka_model) File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/lib/python3.6/site-packages/modules/consensus.py", line 109, in run_medaka subprocess.check_call(['medaka_consensus', '-i', reads_to_center, "-d", center_file, "-o", outfolder, "-t", cores], stdout=output_file, stderr=medaka_stderr) File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['medaka_consensus', '-i', './sample_h1/reads_to_consensus_17.fastq', '-d', './sample_h1/consensus_reference_17.fasta', '-o', './sample_h1/medaka_cl_id_17', '-t', '1']' returned non-zero exit status 2.

I next checked medaka, following advice listed earlier in this thread: (NGSpeciesID) mnelsen@phoebe:~/test_ngspeciesID$ medaka_consensus -i sample_h1/reads_to_consensus_17.fastq -d sample_h1/consensus_reference_17.fasta -o sample_h1/medaka_cl_id_17 -t 1 Traceback (most recent call last): File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/bin/medaka", line 11, in sys.exit(main()) File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/lib/python3.6/site-packages/medaka/medaka.py", line 670, in main import tensorflow as tf File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/lib/python3.6/site-packages/tensorflow/init.py", line 41, in from tensorflow.python.tools import module_util as _module_util File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/lib/python3.6/site-packages/tensorflow/python/init.py", line 64, in from tensorflow.python.framework.framework_lib import * # pylint: disable=redefined-builtin File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/lib/python3.6/site-packages/tensorflow/python/framework/framework_lib.py", line 25, in from tensorflow.python.framework.ops import Graph File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 47, in from tensorflow.python.eager import context File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 28, in from absl import logging File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/lib/python3.6/site-packages/absl/logging/init.py", line 97, in from absl import flags File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/lib/python3.6/site-packages/absl/flags/init.py", line 35, in from absl.flags import _argument_parser File "/home/xxxxxx/mambaforge/envs/NGSpeciesID/lib/python3.6/site-packages/absl/flags/_argument_parser.py", line 82, in class ArgumentParser(Generic[_T], metaclass=_ArgumentParserCache): TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases Checking program versions This is medaka 1.2.3 Program Version Required Pass
bcftools 1.17 1.9 True
bgzip 1.17 1.9 True
minimap2 2.28 2.11 True
samtools 1.18 1.9 True
tabix 1.17 1.9 True
Warning: Output sample_h1/medaka_cl_id_17 already exists, may use old results. usage: medaka tools is_rle_model [-h] [--model MODEL] medaka tools is_rle_model: error: argument --model: expected 1 argument

I'm wondering if you could please help...is it something with tensorflow that is wrong??? Thanks from a beginner!!

ksahlin commented 3 months ago

Hi @mpnelsen ,

Googling the error line TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases I found the error and possible solutions here: https://github.com/tensorflow/tensorflow/issues/487 . Could you try them?

Thanks!

mpnelsen commented 3 months ago

thanks! and sorry, i should have seen that. OK, i followed that and then also (https://github.com/tensorflow/tensorflow/issues/64926#issuecomment-2159831886)

pip install absl-py==1.1.0

and it works! thanks so much!!