nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

medaka 1.11.3 quits early #482

Closed erinyoung closed 3 months ago

erinyoung commented 6 months ago

Describe the bug

I installed medaka version 1.11.3 into a new gitpod environment using pip. I also installed samtools and htslib. I tried to run medaka_consensus, but something happened and medaka couldn't finish. I looked at the lines that were printed to the screen, and it looked like there was something wrong with medaka_version_report. I tried running medaka_version_report, and the output was blank.

Logging This is what first

# running medaka consensus
medaka_consensus -i SRR23473168_1.fastq.gz -d GCA_021601745.3_ASM2160174v3_genomic.fna -o testing

This is the output from medaka_consensus

medaka_consensus -i SRR23473168_1.fastq.gz -d GCA_021601745.3_ASM2160174v3_genomic.fna -o testing:                                              
0.769 TF_CPP_MIN_LOG_LEVEL is set to '3'
1.736 Attempting to automatically select model version.
2.222 WARNING: Failed to detect a model version, will use default: 'r1041_e82_400bps_sup_v4.3.0'
2.222 Checking program versions
2.222 This is medaka 1.11.3

It seemed strange that there was no error, but it looks like line https://github.com/nanoporetech/medaka/blob/9dffdee0de4c76a9bf382b601261db43bd030c26/scripts/medaka_consensus#L115 calls medaka_version_report or exits. So I thought maybe that's where the issue is.

I then tried to call medaka_version_report and nothing was printed to the screen.

Everything else seems to be installed fine. Here are the help outputs of the tools that I was expecting with medaka_version_report last (because it throws an error and prints nothing to the screen).

$ medaka -h && medaka_consensus -h && medaka_counts -h && medaka_data_path -h && medaka_haploid_variant -h && medaka_version_report -h:                                                                                                                                         
1.172 usage: medaka [-h] [--version]                                                                                                            
1.172               {compress_bam,features,train,consensus,smolecule,tandem,consensus_from_features,fastrle,stitch,variant,snp,tools}           
1.172               ...
1.172 
1.172 options:
1.172   -h, --help            show this help message and exit
1.172   --version             show program's version number and exit
1.172 
1.172 subcommands:
1.172   valid commands
1.172 
1.172   {compress_bam,features,train,consensus,smolecule,tandem,consensus_from_features,fastrle,stitch,variant,snp,tools}
1.172                         additional help
1.172     compress_bam        Compress an alignment into RLE form.
1.172     features            Create features for inference.
1.172     train               Train a model from features.
1.172     consensus           Run inference from a trained model and alignments.
1.172     smolecule           Create consensus sequences from single-molecule reads.
1.172     tandem              Targeted tandem repeat variant calling.
1.172     consensus_from_features
1.172                         Run inference from a trained model on existing
1.172                         features.
1.172     fastrle             Create run-length encoded fastq (lengths in quality
1.172                         track).
1.172     stitch              Stitch together output from medaka consensus into
1.172                         final output.
1.172     variant             Decode probabilities to VCF.
1.172     snp                 Decode probabilities to SNPs.
1.172     tools               tools subcommand.
1.215 TF_CPP_MIN_LOG_LEVEL is set to '3'
2.181 
2.181 medaka 1.11.3
2.181 ------------
2.181 
2.181 Assembly polishing via neural networks. Medaka is optimized
2.181 to work with the Flye assembler.
2.181 
2.181 medaka_consensus [-h] -i <fastx> -d <fasta>
2.181 
2.181     -h  show this help text.
2.181     -i  fastx input basecalls (required).
2.181     -d  fasta input assembly (required).
2.181     -o  output folder (default: medaka).
2.181     -g  don't fill gaps in consensus with draft sequence.
2.181     -r  use gap-filling character instead of draft sequence (default: None)
2.181     -m  medaka model, (default: r1041_e82_400bps_sup_v4.3.0).
2.181         Choices: r103_fast_g507 r103_hac_g507 r103_min_high_g345 r103_min_high_g360 r103_prom_high_g360 r103_sup_g507 r1041_e82_260bps_fast_g632 r1041_e82_260bps_hac_g632 r1041_e82_260bps_hac_v4.0.0 r1041_e82_260bps_hac_v4.1.0 r1041_e82_260bps_sup_g632 r1041_e82_260bps_sup_v4.0.0 r1041_e82_260bps_sup_v4.1.0 r1041_e82_400bps_fast_g615 r1041_e82_400bps_fast_g632 r1041_e82_400bps_hac_g615 r1041_e82_400bps_hac_g632 r1041_e82_400bps_hac_v4.0.0 r1041_e82_400bps_hac_v4.1.0 r1041_e82_400bps_hac_v4.2.0 r1041_e82_400bps_hac_v4.3.0 r1041_e82_400bps_sup_g615 r1041_e82_400bps_sup_v4.0.0 r1041_e82_400bps_sup_v4.1.0 r1041_e82_400bps_sup_v4.2.0 r1041_e82_400bps_sup_v4.3.0 r104_e81_fast_g5015 r104_e81_hac_g5015 r104_e81_sup_g5015 r104_e81_sup_g610 r10_min_high_g303 r10_min_high_g340 r941_e81_fast_g514 r941_e81_hac_g514 r941_e81_sup_g514 r941_min_fast_g303 r941_min_fast_g507 r941_min_hac_g507 r941_min_high_g303 r941_min_high_g330 r941_min_high_g340_rle r941_min_high_g344 r941_min_high_g351 r941_min_high_g360 r941_min_sup_g507 r941_prom_fast_g303 r941_prom_fast_g507 r941_prom_hac_g507 r941_prom_high_g303 r941_prom_high_g330 r941_prom_high_g344 r941_prom_high_g360 r941_prom_high_g4011 r941_prom_sup_g507 r941_sup_plant_g610
2.181         Alternatively a .tar.gz/.hdf file from 'medaka train'.
2.181         If not provided, and automatic choice will be attempted based on
2.181         the contents of the input file.
2.181     -f  Force overwrite of outputs (default will reuse existing outputs).
2.181     -x  Force recreation of alignment index.
2.181     -t  number of threads with which to create features (default: 1).
2.181     -b  batchsize, controls memory use (default: 100).
2.181     -q  Output consensus with per-base quality scores (fastq).
2.531 usage: medaka [-h] [--print] [--dtypes DTYPES [DTYPES ...]]
2.531               [--norm NORM [NORM ...]]
2.531               bam region
2.531 
2.531 positional arguments:
2.531   bam                   alignment file.
2.531   region                alignment region to sample.
2.531 
2.531 options:
2.531   -h, --help            show this help message and exit
2.531   --print               print counts. (default: False)
2.531   --dtypes DTYPES [DTYPES ...]
2.531                         perform a multi-datatype tests. (default: None)
2.531   --norm NORM [NORM ...]
2.531                         additional normalisation tests. (total, fwd_rev)
2.531                         (default: None)
2.871 /usr/local/lib/python3.10/dist-packages/medaka/data
2.899 TF_CPP_MIN_LOG_LEVEL is set to '3'
3.842 
3.842 medaka 1.11.3
3.842 ------------
3.842 
3.842 Haploid variant calling via neural networks.
3.842 
3.842 medaka_haploid_variant [-h] -i <fastx> -r <fasta>
3.842 
3.842     -h  show this help text.
3.842     -i  fastx input basecalls (required).
3.842     -r  fasta reference sequence (required).
3.842     -o  output folder (default: medaka).
3.842     -m  medaka model, (default: r1041_e82_400bps_sup_variant_v4.3.0).
3.842         Choices: r103_fast_variant_g507 r103_hac_variant_g507 r103_prom_variant_g3210 r103_sup_variant_g507 r1041_e82_260bps_fast_variant_g632 r1041_e82_260bps_hac_variant_g632 r1041_e82_260bps_hac_variant_v4.1.0 r1041_e82_260bps_sup_variant_g632 r1041_e82_260bps_sup_variant_v4.1.0 r1041_e82_400bps_fast_variant_g615 r1041_e82_400bps_fast_variant_g632 r1041_e82_400bps_hac_variant_g615 r1041_e82_400bps_hac_variant_g632 r1041_e82_400bps_hac_variant_v4.1.0 r1041_e82_400bps_hac_variant_v4.2.0 r1041_e82_400bps_hac_variant_v4.3.0 r1041_e82_400bps_sup_variant_g615 r1041_e82_400bps_sup_variant_v4.1.0 r1041_e82_400bps_sup_variant_v4.2.0 r1041_e82_400bps_sup_variant_v4.3.0 r104_e81_fast_variant_g5015 r104_e81_hac_variant_g5015 r104_e81_sup_variant_g610 r941_e81_fast_variant_g514 r941_e81_hac_variant_g514 r941_e81_sup_variant_g514 r941_min_fast_variant_g507 r941_min_hac_variant_g507 r941_min_sup_variant_g507 r941_prom_fast_variant_g507 r941_prom_hac_variant_g507 r941_prom_sup_variant_g507 r941_prom_variant_g303 r941_prom_variant_g322 r941_prom_variant_g360 r941_sup_plant_variant_g61.
3.842         If not provided, and automatic choice will be attempted based on
3.842         the contents of the input file.
3.842     -s  Perform read realignment when annotating variants.
3.842     -f  Force overwrite of outputs (default will reuse existing outputs).
3.842     -x  Force recreation of alignment index.
3.842     -t  number of threads with which to create features (default: 1).
3.842     -b  batchsize, controls memory use (default: 100).

Environment (if you do not have a GPU, write No GPU):

Additional context Add any other context about the problem here.

cjw85 commented 6 months ago

@erinyoung did you manage to get medaka working? I'm afraid I have not seen an error like this before and don't have any suggestions as to what the problem might be. You shouldn't get blank output from medaka_version_report but at least something like:

$ medaka_version_report
Cannot import pyabpoa, some features may not be available.
Program    Version    Required   Pass
bcftools   Not found  1.11       False
bgzip      Not found  1.11       False
minimap2   Not found  2.11       False
samtools   Not found  1.11       False
tabix      Not found  1.11       False
erinyoung commented 6 months ago

@cjw85 , I did get Medaka working by installing 1.11.2. I would love pointers on how to get 1.11.3 working.

cjw85 commented 6 months ago

There are no differences between 1.11.2 and 1.11.3 that I can see would cause the above behaviour.

cjw85 commented 3 months ago

I will close this issue, as I say there are no functional differences between 1.11.2 and 1.11.3 that would caused the change in behavior.