Closed sklages closed 2 months ago
Hi @sklages,
When run via the basecaller, sample sheets only apply aliases to the specific experiment they are set up for. I would guess that this data has a different experiment id to the one in the sample sheet. You can check this value by running:
pod5 inspect debug /dev/shm/mxqd/mnt/job/51421617/*.pod5 | grep experiment_name
Indeed, .. the rundata folder (aka experiment_id
) has been renamed after the run has finished .. because of a typo.
Maybe - as a feature request - there should be a warning or even a error about mismatching data, here experiment_id
from samplesheet and the experiment_id
actually found in the pod5 data before starting the basecalling!?
thanks for the hint though :-)
Hej,
I have a similar issue. I used to have the aliasID in my bam- and summary files, now I get barcode-IDs such as BC:Z:SQK-NBD114-96_barcode02. I'm using the SQK-NBD114-96 kit, dorado 0.7.2 with the commands mentioned below.
In my samplesheet: experiment_id = 123456 In my pod5-files: (pod5 inspect debug pod5/FLOWCELL_pass_barcode02_id_id2_0.pod5 | grep experiment_name ) experiment_name: 123456
When applying demultiplexing (no-classify, see below), the files are properly named by the alias.
Is this part of the "updates to barcode classification" metnioned in the changelog? Thanks and all the best! Philipp
# dorado
# basecaller
dorado basecaller -v dna_r10.4.1_e8.2_400bps_sup@v5.0.0 pod5/ --sample-sheet mySampleSheet.csv --kit-name SQK-NBD114-96 > 123456.bam
# demux
dorado demux -t 16 --output-dir demux_123456 -v --no-classify --sample-sheet mySampleSheet.csv --emit-fastq 123456.bam
Hi @phpeters,
No, the updates are regarding the way we choose which classification to select, not in the naming.
Can you also check flowcell_id
and sequencer_position
in the grep
above? These need to match your flowcell_id
and/or position_id
columns (only one or the other needs to be present).
(For clarity, these columns need to match when using the sample sheet during basecalling. When used with the demux
command much of this data is not available, so in this case we simply require that there is a unique mapping from barcode name to alias name. This probably explains why you get aliases during demux.)
Hej @malton-ont ,
Thanks for the clarification! I checked and the sequencer_position in the pod5 is equal to the position_id in the sampleSheet. (flowcell_id is not present)
Best! Philipp
@phpeters,
Are you able to share a read that exhibits this, and the sample sheet to match?
@malton-ont I did a subset, can I upoad it to a box somewhere? It is client's data
@phpeters,
Are you able to open a support ticket? Technical services can then give you a link to upload it. Make sure you ask them to direct it to me!
@malton-ont I shared the small subset in a support ticket and asked to forward it to you. Do you need/want the ticket ID?
@phpeters,
Yes please, then I can chase it up.
@malton-ont you have mail
@phpeters,
Ah-ha! Apologies, the correct grep was flow_cell_id
- checking this shows that this value is present in the pod5 but it's blank in the sample sheet. You can either add the correct value to the sample sheet or remove the column entirely (sample sheets need at least one, but not necessarily both, of flowcell_id
and position_id
).
Note that the debug info that is printed regarding the Barcode distribution
does not apply the aliasing, but this is applied for the BC:Z
tag and read groups (and the filename when demuxing).
ah-haaaa! Indeed, without the column flow_cell_id
in the sample sheet it worked out just fine. Thanks a ton!
But this is the original sample sheet I get from our minION (only extended by the columns barcode,alias
). The promethION puts the flowcell_ID properly into the column, the minION's minKNOW doesn't do this.
The MinKNOW version for the minION is 23.11.7, on the promethION it's 24.02.19 - maybe it's that?
Thanks again! Philipp
Aaaaaaah-haaaaaa! I checked previous runs in the same machine (with the same MinKNOW version) and for them, the column flow_cell_id
was present in the sample sheet. But those were MIN114-flowcells, this time it was a FLG114.
And I just learned from the lab that the flowcell-ID is put in manually for flongle FCs whereas it is automagically detected for MIN FCs. My mystery is solved, sorry for bothering you.
Thanks again and have a great weekend!
Philipp
That's great @phpeters, glad the mystery is solved! And thanks for your help investigating.
I am obviously doing something wrong, but I don't see the problem :-)
Running a P2 flowcell with a pool "SAMPLE_ID" containing four libs/barcodes (barcode01-04).
Basecalling and demultiplexing is done with current
dorado v0.7.2
, mod_bases=5mCG_5hmCG.Samplesheet I used:
This results in the following BAM header, no alias showing up, instead
SQK-NBD114-96_barcode01
toSQK-NBD114-96_barcode04
:The final
dorado demux
call accordingly writes BAM files not using aliases:.. instead of:
Using an identical
sample_id
with distinct barcodes/aliases was at least working inv0.7.0
...What do I miss here? Where is my mistake?