Closed lucy924 closed 4 months ago
Hi @lucy924,
Are you basecalling data from multiple experiments at once? Sample sheets only apply to the experiment id stated in that column, so if you have a mixed dataset only that one experiment will be aliased and any other samples will simply be barcoded as normal.
Nope it's just the one experiment
Update: Just looked at the latest output and I have 18 total alignments in the SQK-NBD114-24_barcode01.bam
, while the alias-named bam is 4.23GB in size.
I also noticed that the alias didn't work for barcodes 02 and 03, there are no alias named bams for them and their SQK
labelled bams are 7.27GB and 5.39GB respectively. The samples 02 and 03 were set up slightly differently to the rest when I started, they were the first major experiment I did on our sequencer so I was figuring things out - however the file structure looks exactly the same as the others, I can't see anything obvious that would cause them to behave differently during demuxing.
Update # 2: ah I think I have figured out what might have happened. I have done a few rearrangements of the data files in order to make them make more sense, and barcodes 02 and 03 likely had a different experiment id when I began this series of experiments. Moving the files into the main experiment directory wouldn't have changed the experiment id within the run parameters recorded. Thank you for letting me know this isn't expected behaviour and pointing me in the right direction to figure it out!
Demuxing with a sample sheet produces two bam files per barcode
I use a sample sheet with the
demux
command with aliases, and it outputs both a bam named with the alias and a separate bam with the kitname_barcode. e.g. I would get a bam file calledTest1.bam
(using the alias) as well as a file calledSQK-NBD114-24_barcode01.bam
in the same output directory. Why is this? What is the difference between these two files? Should I merge them?Unfortunately I've had to cleanup a lot of my files due to space issues so I can't check the size difference between them now, but I'm currently running a demuxing now and will update if I need to with those size differences.
Steps to reproduce the issue:
Sample sheet looks like:
ran basecalling:
Dorado demux command:
Run environment:
Dorado version: dorado-0.6.2-linux-x64
Dorado command: (as above)
Operating system: Linux slurm cluster
Hardware (CPUs, Memory, GPUs):
Source data type: pod5
Source data location: on device, different path to working directory
Details about data: N/A, has happened with lots of different datasets