Closed olawa closed 5 months ago
Hi @olawa,
Could you please run the following pod5
command on one of your .pod5 files and report what you see?
pip install pod5
pod5 inspect debug <data.pod5> | grep -E "flow_cell_product_code|sequencing_kit"
# Should report the following:
flow_cell_product_code: <flowcell_product_code>
sequencing_kit: <sequencing_kit>
Kind regards, Rich
We expect to see SQK_LSK114_260
for the sequencing_kit for your 260bps condition.
I suspect that your data will show SQK_LSK114
which is what is causing the incorrect model to be selected.
Unfortunately in this case the auto model selection is not supported for your data.
Please download the correct model with dorado download --model dna_r10.4.1_e8.2_260bps_sup@v4.1.0
and basecall using this model path.
Kind regards, Rich
Edit: Are you sure that your data is 260bps?
Hello @HalfPhoton,
I have the same problem basecalling my 260bps data with v0.5.1. pod5 inspect debug
confirms it's 260bps. Like you said, the naming doesn't match dorado's requirement so I have to resort to using the model path.
context_tags: {..... 'sample_frequency': '4000', 'selected_speed_bases_per_second': '260', 'sequencing_kit': 'sqk-lsk114'}
flow_cell_product_code: FLO-PRO114M
sequencing_kit: sqk-lsk114
I want to run dorado duplex in methylation detection mode:
dorado duplex dna_r10.4.1_e8.2_260bps_sup@v4.1.0_5mCG_5hmCG@v2/ pod5/ > duplex.bam
which throws an error:
[2024-01-17 11:04:23.766] [error] Could not find information on simplex model: dna_r10.4.1_e8.2_260bps_sup@v4.1.0_5mCG_5hmCG@v2
Any suggestion would be much appreciated! :)
Hi @HalfPhoton
This problem seems to be a bit more widespread. I just freshly converted a kit 14 260 BPS run from it's original FAST5 to POD5 with pod5 0.3.6 and the results are below. flow_cell_product_code
is FLO-PRO114M
. Also it looks like there is a selected_speed_bases_per_second
field in context_tags
which might be a more reliable way of getting speed from this field?
$ pod5 inspect debug PAO27011_pass_7b4991d0_ec3250cb.pod5 | grep -E "flow_cell_product_code|sequencing_kit"
context_tags: {'barcoding_enabled': '0', 'basecall_config_filename': 'dna_r10.4.1_e8.2_260bps_hac_prom.cfg', 'experiment_duration_set': '4320', 'experiment_type': 'genomic_dna', 'local_basecalling': '1', 'package': 'bream4', 'package_version': '7.4.8', 'sample_frequency': '4000', 'selected_speed_bases_per_second': '260', 'sequencing_kit': 'sqk-lsk114'}
flow_cell_product_code: FLO-PRO114M
sequencing_kit: sqk-lsk114
tracking_id: {'asic_id': 'FFFF80342813138F', 'asic_id_eeprom': 'FFFF80342813138F', 'asic_temp': '47.192307', 'asic_version': 'Unknown', 'auto_update': '0', 'auto_update_source': 'https://cdn.oxfordnanoportal.com/software/MinKNOW/', 'bream_is_standard': '0', 'configuration_version': '5.4.7', 'device_id': '3G', 'device_type': 'promethion', 'distribution_status': 'stable', 'distribution_version': '22.12.5', 'exp_script_name': 'sequencing/sequencing_PRO114_DNA_e8_2_260T:FLO-PRO114M:SQK-LSK114:260', 'exp_script_purpose': 'sequencing_run', 'exp_start_time': '2023-03-22T16:12:16.106324+00:00', 'flow_cell_id': 'PAO27011', 'flow_cell_product_code': 'FLO-PRO114M', 'guppy_version': '6.4.6+ae70e8f', 'heatsink_temp': '28.164078', 'host_product_code': 'PRO-PRC024', 'host_product_serial_number': 'PC24B148', 'hostname': 'PC24B148', 'hublett_board_id': '0001588c7bce0911', 'hublett_firmware_version': '2.1.10', 'installation_type': 'nc', 'ip_address': '', 'local_firmware_file': '1', 'mac_address': '', 'operating_system': 'ubuntu 20.04', 'protocol_group_id': 'ToR_134', 'protocol_run_id': '7b4991d0-894d-45e0-a90b-b56662829308', 'protocol_start_time': '2023-03-22T16:06:32.498753+00:00', 'protocols_version': '7.4.8', 'run_id': 'ec3250cbda36eeb10a75679ec5932365a40302c0', 'sample_id': 'OESO_152_LSK114', 'satellite_board_id': '01377075e7de3a75', 'satellite_firmware_version': '2.1.9', 'sequencer_hardware_revision': '', 'sequencer_product_code': 'PRO-SEQ024', 'sequencer_serial_number': '', 'usb_config': 'fx3_0.0.0#fpga_0.0.0#unknown#unknown', 'version': '5.4.3'}
Hi @mp15 and @ritatam ,
Apologies for the delay in my reply.
I've raised this internally and we will look into fixing this issue in an upcoming release.
Kind regards, Rich
@ritatam ,
Your issue regarding: dorado duplex dna_r10.4.1_e8.2_260bps_sup@v4.1.0_5mCG_5hmCG@v2/ pod5/ > duplex.bam
Is due to the incorrect syntax - this command is trying to search for a simplex model with this name which doesn't exist.
I think you want:
# sup@v4.1.0 - simplex model selection
# 5mCG_5hmCG@v2 - modification model selection (separated by comma)
dorado duplex sup@v4.1.0,5mCG_5hmCG@v2 pod5/ > duplex.bam
More info about the syntax can be found in the readme.md#automatic-model-selection-complex
However if you're facing issues with automatic model selection for 260 bps models this approach may be better:
dorado download --model dna_r10.4.1_e8.2_260bps_sup@v4.1.0
# the 5mCG_5hmCG model should be downloaded automatically.
dorado basecaller dna_r10.4.1_e8.2_260bps_sup@v4.1.0 --modified-bases 5mCG_5hmCG pod5/ > duplex.bam
Kind regards, Rich
Hi Rich,
no problem for my part, I noticed it because quality was reduced. Just wanted to raise the issue in case it was widespread. Suppose you could solve it by adding a speed flag, or have dorado calculate it from a few reads.
Closing as this has been fixed in a previous release.
Hi @tijyojwad,
here is the problem I had with auto-selection of models for 260bps.
Running dorado duplex sup for a LSK114 260bps run gives this:
[2024-01-01 21:55:55.696] [info] - downloading dna_r10.4.1_e8.2_400bps_sup@v4.1.0 with httplib
Same with dorado basecaller sup
Running duplex with the correct model tells me there is no duplex stereo model for 4kHz 260 bps, is this correct?