nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
488 stars 59 forks source link

v0.5.1 downloads 400bps model for kit 14 260 bps run #554

Closed olawa closed 5 months ago

olawa commented 8 months ago

Hi @tijyojwad,

here is the problem I had with auto-selection of models for 260bps.

Running dorado duplex sup for a LSK114 260bps run gives this:

[2024-01-01 21:55:55.696] [info] - downloading dna_r10.4.1_e8.2_400bps_sup@v4.1.0 with httplib

Same with dorado basecaller sup

Running duplex with the correct model tells me there is no duplex stereo model for 4kHz 260 bps, is this correct?

HalfPhoton commented 8 months ago

Hi @olawa,

Could you please run the following pod5 command on one of your .pod5 files and report what you see?

pip install pod5
pod5 inspect debug <data.pod5> | grep -E "flow_cell_product_code|sequencing_kit"

# Should report the following:
  flow_cell_product_code: <flowcell_product_code>
  sequencing_kit: <sequencing_kit>

Kind regards, Rich

HalfPhoton commented 8 months ago

We expect to see SQK_LSK114_260 for the sequencing_kit for your 260bps condition.

I suspect that your data will show SQK_LSK114 which is what is causing the incorrect model to be selected.

Unfortunately in this case the auto model selection is not supported for your data.

Please download the correct model with dorado download --model dna_r10.4.1_e8.2_260bps_sup@v4.1.0 and basecall using this model path.

Kind regards, Rich

Edit: Are you sure that your data is 260bps?

ritatam commented 8 months ago

Hello @HalfPhoton,

I have the same problem basecalling my 260bps data with v0.5.1. pod5 inspect debug confirms it's 260bps. Like you said, the naming doesn't match dorado's requirement so I have to resort to using the model path.

context_tags: {..... 'sample_frequency': '4000', 'selected_speed_bases_per_second': '260', 'sequencing_kit': 'sqk-lsk114'}
        flow_cell_product_code: FLO-PRO114M
        sequencing_kit: sqk-lsk114

I want to run dorado duplex in methylation detection mode: dorado duplex dna_r10.4.1_e8.2_260bps_sup@v4.1.0_5mCG_5hmCG@v2/ pod5/ > duplex.bam

which throws an error: [2024-01-17 11:04:23.766] [error] Could not find information on simplex model: dna_r10.4.1_e8.2_260bps_sup@v4.1.0_5mCG_5hmCG@v2

Any suggestion would be much appreciated! :)

mp15 commented 7 months ago

Hi @HalfPhoton

This problem seems to be a bit more widespread. I just freshly converted a kit 14 260 BPS run from it's original FAST5 to POD5 with pod5 0.3.6 and the results are below. flow_cell_product_code is FLO-PRO114M. Also it looks like there is a selected_speed_bases_per_second field in context_tagswhich might be a more reliable way of getting speed from this field?

$ pod5 inspect debug PAO27011_pass_7b4991d0_ec3250cb.pod5 | grep -E "flow_cell_product_code|sequencing_kit"

        context_tags: {'barcoding_enabled': '0', 'basecall_config_filename': 'dna_r10.4.1_e8.2_260bps_hac_prom.cfg', 'experiment_duration_set': '4320', 'experiment_type': 'genomic_dna', 'local_basecalling': '1', 'package': 'bream4', 'package_version': '7.4.8', 'sample_frequency': '4000', 'selected_speed_bases_per_second': '260', 'sequencing_kit': 'sqk-lsk114'}
        flow_cell_product_code: FLO-PRO114M
        sequencing_kit: sqk-lsk114
        tracking_id: {'asic_id': 'FFFF80342813138F', 'asic_id_eeprom': 'FFFF80342813138F', 'asic_temp': '47.192307', 'asic_version': 'Unknown', 'auto_update': '0', 'auto_update_source': 'https://cdn.oxfordnanoportal.com/software/MinKNOW/', 'bream_is_standard': '0', 'configuration_version': '5.4.7', 'device_id': '3G', 'device_type': 'promethion', 'distribution_status': 'stable', 'distribution_version': '22.12.5', 'exp_script_name': 'sequencing/sequencing_PRO114_DNA_e8_2_260T:FLO-PRO114M:SQK-LSK114:260', 'exp_script_purpose': 'sequencing_run', 'exp_start_time': '2023-03-22T16:12:16.106324+00:00', 'flow_cell_id': 'PAO27011', 'flow_cell_product_code': 'FLO-PRO114M', 'guppy_version': '6.4.6+ae70e8f', 'heatsink_temp': '28.164078', 'host_product_code': 'PRO-PRC024', 'host_product_serial_number': 'PC24B148', 'hostname': 'PC24B148', 'hublett_board_id': '0001588c7bce0911', 'hublett_firmware_version': '2.1.10', 'installation_type': 'nc', 'ip_address': '', 'local_firmware_file': '1', 'mac_address': '', 'operating_system': 'ubuntu 20.04', 'protocol_group_id': 'ToR_134', 'protocol_run_id': '7b4991d0-894d-45e0-a90b-b56662829308', 'protocol_start_time': '2023-03-22T16:06:32.498753+00:00', 'protocols_version': '7.4.8', 'run_id': 'ec3250cbda36eeb10a75679ec5932365a40302c0', 'sample_id': 'OESO_152_LSK114', 'satellite_board_id': '01377075e7de3a75', 'satellite_firmware_version': '2.1.9', 'sequencer_hardware_revision': '', 'sequencer_product_code': 'PRO-SEQ024', 'sequencer_serial_number': '', 'usb_config': 'fx3_0.0.0#fpga_0.0.0#unknown#unknown', 'version': '5.4.3'}
HalfPhoton commented 7 months ago

Hi @mp15 and @ritatam ,

Apologies for the delay in my reply.

I've raised this internally and we will look into fixing this issue in an upcoming release.

Kind regards, Rich

HalfPhoton commented 7 months ago

@ritatam ,

Your issue regarding: dorado duplex dna_r10.4.1_e8.2_260bps_sup@v4.1.0_5mCG_5hmCG@v2/ pod5/ > duplex.bam Is due to the incorrect syntax - this command is trying to search for a simplex model with this name which doesn't exist.

I think you want:

# sup@v4.1.0 - simplex model selection
# 5mCG_5hmCG@v2 - modification model selection (separated by comma)
dorado duplex sup@v4.1.0,5mCG_5hmCG@v2 pod5/ > duplex.bam

More info about the syntax can be found in the readme.md#automatic-model-selection-complex

However if you're facing issues with automatic model selection for 260 bps models this approach may be better:

dorado download --model dna_r10.4.1_e8.2_260bps_sup@v4.1.0 
# the 5mCG_5hmCG model should be downloaded automatically.
dorado basecaller dna_r10.4.1_e8.2_260bps_sup@v4.1.0  --modified-bases 5mCG_5hmCG  pod5/ > duplex.bam

Kind regards, Rich

olawa commented 7 months ago

Hi Rich,

no problem for my part, I noticed it because quality was reduced. Just wanted to raise the issue in case it was widespread. Suppose you could solve it by adding a speed flag, or have dorado calculate it from a few reads.

tijyojwad commented 5 months ago

Closing as this has been fixed in a previous release.