nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

Dorado does not resume properly and creates identical duplicate reads #672

Closed billytcl closed 4 months ago

billytcl commented 6 months ago

Issue Report

Please describe the issue:

This seems to be a strange edge case, but sometimes dorado (0.5.3) seems to not be able to resume properly, especially from mixed pod5 files.

I first ran into this issue when modkit was complaining about a large number of duplicate reads in its log. Looking further, I found reads that were duplicated in the bam file -- not secondary/supplemental alignments from an original read -- literally everything was the same.

This led me to believe that sometimes dorado sometimes cannot enumerate properly which reads are remaining for basecalling. Looking further, this occurs in a MinKNOW-binned barcoded pod5 file as well but not to the same extent.

This happens on our local GPU-enabled server (3090) and also on our HPC (A100s).

Steps to reproduce the issue:

The following is doing basecalling on a sample mixed pod5:

billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test$ /mnt/ix1/Resources/tools/dorado/v0.5.3/bin/dorado basecaller sup,5mC_5hmC pod5_mixed/ --recursive --min-qscore 7 --no-trim --kit-name EXP-NBD196 --reference /mnt/ix1/Resources/GenomeRef/Homo_sapiens/Ensembl/GRCh38_no_alt/Sequence/WholeGenomeFasta/hs38_naa.fna > mixed_pod5.bam
[2024-03-05 10:45:32.049] [info]  - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0 with httplib
[2024-03-05 10:45:35.711] [info]  - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0_5mC_5hmC@v1 with httplib
[2024-03-05 10:45:36.126] [info] > Creating basecall pipeline
[2024-03-05 10:46:25.618] [info]  - set batch size for cuda:0 to 640
[2024-03-05 10:46:25.646] [info]  - set batch size for cuda:1 to 640
[2024-03-05 10:47:19.719] [info] Barcode for EXP-NBD196
[2024-03-05 10:47:25.563] [info] > Simplex reads basecalled: 3971
[2024-03-05 10:47:25.563] [info] > Simplex reads filtered: 108
[2024-03-05 10:47:25.563] [info] > Basecalled @ Samples/s: 4.600469e+06
[2024-03-05 10:47:25.563] [info] > 4551 reads demuxed @ classifications/s: 7.808854e+02
[2024-03-05 10:47:26.583] [info] > Finished
billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test$ /mnt/ix1/Resources/tools/dorado/v0.5.3/bin/dorado basecaller sup,5mC_5hmC pod5_mixed/ --recursive --min-qscore 7 --no-trim --kit-name EXP-NBD196 --reference /mnt/ix1/Resources/GenomeRef/Homo_sapiens/Ensembl/GRCh38_no_alt/Sequence/WholeGenomeFasta/hs38_naa.fna --resume-from mixed_pod5.bam > mixed_pod5_2.bam
[2024-03-05 10:48:13.235] [info]  - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0 with httplib
[2024-03-05 10:48:15.794] [info]  - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0_5mC_5hmC@v1 with httplib
[2024-03-05 10:48:16.414] [info] > Creating basecall pipeline
[2024-03-05 10:48:30.441] [info]  - set batch size for cuda:0 to 640
[2024-03-05 10:48:30.471] [info]  - set batch size for cuda:1 to 640
[2024-03-05 10:49:25.930] [info] Barcode for EXP-NBD196
[2024-03-05 10:49:25.945] [info] > Inspecting resume file...
[2024-03-05 10:49:26.031] [info] > 4551 reads found in resume file.
[2024-03-05 10:49:28.542] [info] > Simplex reads basecalled: 3971
[2024-03-05 10:49:28.543] [info] > Simplex reads filtered: 108
[2024-03-05 10:49:28.543] [info] > Basecalled @ Samples/s: 2.682198e+06
[2024-03-05 10:49:28.543] [info] > 1144 reads demuxed @ classifications/s: 4.557769e+02
[2024-03-05 10:49:29.666] [info] > Finished

Here it is showing that there is an increase in the number of reads on the second resumed bam file. The number of reads increases by 25%. Because the run finished, it shouldn't increase in read numbers at all.

billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test$ /mnt/ix1/Resources/tools/samtools/v1.15.1/samtools-1.15.1/samtools flagstat mixed_pod5.bam
6072 + 0 in total (QC-passed reads + QC-failed reads)
**4551** + 0 primary
893 + 0 secondary
628 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
4297 + 0 mapped (70.77% : N/A)
2776 + 0 primary mapped (61.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test$ /mnt/ix1/Resources/tools/samtools/v1.15.1/samtools-1.15.1/samtools flagstat mixed_pod5_2.bam
7460 + 0 in total (QC-passed reads + QC-failed reads)
**5695** + 0 primary
1056 + 0 secondary
709 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
5240 + 0 mapped (70.24% : N/A)
3475 + 0 primary mapped (61.02% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

And here I am running dorado on a pod5 binned to barcode01 from MinKNOW:

billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test$ /mnt/ix1/Resources/tools/dorado/v0.5.3/bin/dorado basecaller sup,5mC_5hmC pod5/ --recursive --min-qscore 7 --no-trim --kit-name EXP-NBD196 --reference /mnt/ix1/Resources/GenomeRef/Homo_sapiens/Ensembl/GRCh38_no_alt/Sequence/WholeGenomeFasta/hs38_naa.fna > barcode01_pod5.bam
[2024-03-05 10:53:19.815] [info]  - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0 with httplib
[2024-03-05 10:53:22.390] [info]  - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0_5mC_5hmC@v1 with httplib
[2024-03-05 10:53:22.808] [info] > Creating basecall pipeline
[2024-03-05 10:53:36.563] [info]  - set batch size for cuda:0 to 640
[2024-03-05 10:53:36.593] [info]  - set batch size for cuda:1 to 640
[2024-03-05 10:54:31.243] [info] Barcode for EXP-NBD196
[2024-03-05 10:54:37.687] [info] > Simplex reads basecalled: 3993
[2024-03-05 10:54:37.688] [info] > Simplex reads filtered: 7
[2024-03-05 10:54:37.688] [info] > Basecalled @ Samples/s: 2.764183e+06
[2024-03-05 10:54:37.688] [info] > 4009 reads demuxed @ classifications/s: 6.234837e+02
[2024-03-05 10:54:38.878] [info] > Finished
billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test$ /mnt/ix1/Resources/tools/dorado/v0.5.3/bin/dorado basecaller sup,5mC_5hmC pod5/ --recursive --min-qscore 7 --no-trim --kit-name EXP-NBD196 --reference /mnt/ix1/Resources/GenomeRef/Homo_sapiens/Ensembl/GRCh38_no_alt/Sequence/WholeGenomeFasta/hs38_naa.fna --resume-from barcode01_pod5.bam > barcode01_pod5_2.bam
[2024-03-05 10:58:21.688] [info]  - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0 with httplib
[2024-03-05 10:58:24.240] [info]  - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0_5mC_5hmC@v1 with httplib
[2024-03-05 10:58:24.614] [info] > Creating basecall pipeline
[2024-03-05 10:58:38.448] [info]  - set batch size for cuda:0 to 640
[2024-03-05 10:58:38.476] [info]  - set batch size for cuda:1 to 640
[2024-03-05 10:59:33.725] [info] Barcode for EXP-NBD196
[2024-03-05 10:59:33.740] [info] > Inspecting resume file...
[2024-03-05 10:59:33.799] [info] > 4009 reads found in resume file.
[2024-03-05 10:59:36.008] [info] > Simplex reads basecalled: 3993
[2024-03-05 10:59:36.008] [info] > Simplex reads filtered: 7
[2024-03-05 10:59:36.008] [info] > Basecalled @ Samples/s: 8.942825e+04
[2024-03-05 10:59:36.008] [info] > 32 reads demuxed @ classifications/s: 1.448619e+01
[2024-03-05 10:59:37.071] [info] > Finished

The effect here is not as prominent (<1% increase in reads):

billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test$ /mnt/ix1/Resources/tools/samtools/v1.15.1/samtools-1.15.1/samtools flagstat barcode01_pod5.bam
5048 + 0 in total (QC-passed reads + QC-failed reads)
**4009** + 0 primary
871 + 0 secondary
168 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
4615 + 0 mapped (91.42% : N/A)
3576 + 0 primary mapped (89.20% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test$ /mnt/ix1/Resources/tools/samtools/v1.15.1/samtools-1.15.1/samtools flagstat barcode01_pod5_2.bam
5085 + 0 in total (QC-passed reads + QC-failed reads)
**4041** + 0 primary
873 + 0 secondary
171 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
4640 + 0 mapped (91.25% : N/A)
3596 + 0 primary mapped (88.99% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Run environment:

tijyojwad commented 6 months ago

Hi @billytcl - this is a really nice find. thank you so much for reporting and for digging into this!

I think I know why this is happening. When a read is split in dorado, we generate new read ids per child read which get written out to the bam. So when we resume from the file, we load in the new split read ids. But we don't account for the original read ids they came from. So the data loader thinks those still need to be processed. This is a bug on our end, so we'll put in a fix for this in the next release.

Thank you again!!

billytcl commented 6 months ago

I think this is definitely contributing, but if this was the only reason then shouldn’t the number of reads double on resuming a completed run if the pod5 is “mixed” which means it contains all reads that have split points? Maybe it’s a little more complicated…

On Tue, Mar 5, 2024 at 6:15 PM Joyjit Daw @.***> wrote:

Hi @billytcl https://github.com/billytcl - this is a really nice find. thank you so much for reporting and for digging into this!

I think I know why this is happening. When a read is split in dorado, we generate new read ids per child read which get written out to the bam. So when we resume from the file, we load in the new split read ids. But we don't account for the original read ids they came from. So the data loader thinks those still need to be processed. This is a bug on our end, so we'll put in a fix for this in the next release.

Thank you again!!

— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/672#issuecomment-1979955246, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPHYTZTQKIAGMZJZMFBN2DYWZ33XAVCNFSM6AAAAABEHWOIHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZHE2TKMRUGY . You are receiving this because you were mentioned.Message ID: @.***>

tijyojwad commented 6 months ago

It won't double since not all reads are split by dorado. e.g. If the completed dataset had 100 reads, and 80 of them have the original read ids and 20 are read ids (say split from 5 original reads), then on resume the original 80 would not get re-basecalled but split ones would. So the final dataset would be 80 + 20 + 20 = 120 (leading to about a 20% increase in count).

I'm not following the "mixed" dataset. Is it a combination of pod5s where each pod5 contains reads from a specific barcode?

billytcl commented 6 months ago

MinKNOW, during a run, outputs barcoded pod5s into specific folders (barcode01, etc), unclassified, and mixed. My impression is that mixed is the ones that have a split point where the two reads have different barcodes. At least, that’s my impression of it — is that incorrect?

Billy

On Wed, Mar 6, 2024 at 6:18 AM Joyjit Daw @.***> wrote:

It won't double since not all reads are split by dorado. e.g. If the completed dataset had 100 reads, and 80 of them have the original read ids and 20 are read ids (say split from 5 original reads), then on resume the original 80 would not get re-basecalled but split ones would. So the final dataset would be 80 + 20 + 20 = 120 (leading to about a 20% increase in count).

I'm not following the "mixed" dataset. Is it a combination of pod5s where each pod5 contains reads from a specific barcode?

— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/672#issuecomment-1980974812, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPHYT5V5Y7BPQUN53G7CYLYW4QVFAVCNFSM6AAAAABEHWOIHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBQHE3TIOBRGI . You are receiving this because you were mentioned.Message ID: @.***>

tijyojwad commented 6 months ago

Hi @billytcl - mixed contains reads that were split by minknow and the split reads have different barcodes. I think what's happening is dorado is finding further split points within those reads (likely those missed by MinKNOW). I have to find a mixed dataset to validate this, but you could also check this theory in your dataset - the duplicated reads should be the ones with the pi:Z tag which denotes they have a parent read id.

your understanding is correct that the reads in mixed are unspilt reads. I wonder then if dorado is only splitting a subset of them and not all of them, which would also be an interesting find (more of a splitting issue).

Could you check the earlier theory I have in your dataset - the duplicated reads should only be the ones with the pi:Z tag which denotes they have a parent read id.

billytcl commented 6 months ago

Yup -- it looks like only the ones with pi:Z are being duplicated.

Duplicate reads from resume analysis Pre-resume:

(base) billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test$ /mnt/ix1/Resources/tools/samtools/v1.15.1/samtools-1.15.1/samtools view mixed_pod5.bam -F 4 -F 256 -F 1024 -F 2048 | grep "pi:Z" | cut -f 1 | sort | uniq -c | sed 's/^[ ]*//;s/ /\t/' | sort -k1,1nr | head
1       000ab610-17d5-4a4f-9693-0a1fb3fa52b8
1       0023a9cd-5ac0-4383-aa63-590c3002d30f
1       003a4576-8ffc-4b1e-acc4-9e33a466e1ab
1       0094d6e8-09d2-4183-99ef-c0fa1544a8b0
1       00e263af-d33a-411e-8890-3986263f5b16
1       013917e5-3cb7-493e-b97a-2031d67b73e1
1       014b9feb-d204-442a-a7a5-c209f305dbc6
1       016d70f8-1077-4022-9fca-06f5c938c2c1
1       01ec257a-aa26-4a60-a1cd-0aea16e90efa
1       02240168-5e2d-42f3-a3ad-d6cebfac1a07

After resume, showing that these reads show up twice:

/mnt/ix1/Resources/tools/samtools/v1.15.1/samtools-1.15.1/samtools view mixed_pod5_2.bam -F 4 -F 256 -F 1024 -F 2048 | grep "pi:Z" | cut -f 1 | sort | uniq -c | sed 's/^[ ]*//;s/ /\t/' | sort -k1,1nr | tail
2       fd04024a-ce1b-4c84-907e-3408084c036b
2       fd05c084-92c0-4f3e-8bdd-edb7795c00c9
2       fd41d88b-094d-4595-9d54-3d5ef5eb2e13
2       fda08a02-424c-4e42-9716-0dbb9b0b769c
2       fe5bf267-9cfe-4338-8d7a-9e340188e60d
2       fe979f2c-4bc0-43bf-a285-5b900b00e314
2       fe9e1fcf-f26a-45b7-9047-280a0840c9c2
2       ffbb0e45-9257-474f-856e-90d5cf14b617
2       ffd32848-831a-45c3-b402-f7437e4949f0
2       ffd9c237-48cc-46b3-a315-f6f3c0831c59

Here's the output from ont-dorado-server v7.0.8:

Before resume:

(base) billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test$ /mnt/ix2/Experimental_tools/ont-dorado-server/bin/ont_basecall_client -p 5555 -i pod5_mixed/ -s pod5_mixed_dorado_server/ -c dna_r10.4.1_e8.2_400bps_5khz_modbases_5mc_sup_prom.cfg --recursive --compress_fastq --barcode_kits EXP-NBD196 --align_ref /mnt/ix1/Resources/GenomeRef/Homo_sapiens/Ensembl/GRCh38_no_alt/Sequence/WholeGenomeFasta/hs38_naa.fna --bam_out --min_qscore 7 --do_read_splitting --max_read_split_depth 4 --index --progress_stats_frequency 1000 --read_batch_size 200000
ONT basecalling software version 7.0.8+d4cb05e23, client-server API version 15.0.0
config file:        /mnt/ix2/Experimental_tools/ont-dorado-server/data/dna_r10.4.1_e8.2_400bps_5khz_modbases_5mc_sup_prom.cfg
input path:         pod5_mixed/
save path:          pod5_mixed_dorado_server/
chunk size:         2000
minimum qscore:     7
records per file:   4000
fastq compression:  ON
alignment file:     /mnt/ix1/Resources/GenomeRef/Homo_sapiens/Ensembl/GRCh38_no_alt/Sequence/WholeGenomeFasta/hs38_naa.fna
alignment type:     auto

Use of this software is permitted solely under the terms of the end user license agreement (EULA).
By running, copying or accessing this software, you are demonstrating your acceptance of the EULA.
The EULA may be found in /mnt/ix2/Experimental_tools/ont-dorado-server/bin

Found 1 input read file to process.
Init time: 149854 ms
[PROG_STAT_HDR] time elapsed(secs), time remaining (estimate), total reads processed, total reads (estimate), interval(secs), interval reads processed, interval bases processed
[PROG_STAT] 34.0831, 0, 4000, 4000, 34.0831, 7774, 1890902
Caller time: 34083 ms, Samples called: 26654932, samples/s: 782060
Finishing up any open output files.
Basecalling completed successfully.

You can see that resuming works properly here -- the --resume does nothing:

(base) billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test$ /mnt/ix2/Experimental_tools/ont-dorado-server/bin/ont_basecall_client -p 5555 -i pod5_mixed/ -s pod5_mixed_dorado_server/ -c dna_r10.4.1_e8.2_400bps_5khz_modbases_5mc_sup_prom.cfg --recursive --compress_fastq --barcode_kits EXP-NBD196 --align_ref /mnt/ix1/Resources/GenomeRef/Homo_sapiens/Ensembl/GRCh38_no_alt/Sequence/WholeGenomeFasta/hs38_naa.fna --bam_out --min_qscore 7 --do_read_splitting --max_read_split_depth 4 --index --progress_stats_frequency 1000 --read_batch_size 200000 --resume
ONT basecalling software version 7.0.8+d4cb05e23, client-server API version 15.0.0
config file:        /mnt/ix2/Experimental_tools/ont-dorado-server/data/dna_r10.4.1_e8.2_400bps_5khz_modbases_5mc_sup_prom.cfg
input path:         pod5_mixed/
save path:          pod5_mixed_dorado_server/
chunk size:         2000
minimum qscore:     7
records per file:   4000
fastq compression:  ON
alignment file:     /mnt/ix1/Resources/GenomeRef/Homo_sapiens/Ensembl/GRCh38_no_alt/Sequence/WholeGenomeFasta/hs38_naa.fna
alignment type:     auto

Use of this software is permitted solely under the terms of the end user license agreement (EULA).
By running, copying or accessing this software, you are demonstrating your acceptance of the EULA.
The EULA may be found in /mnt/ix2/Experimental_tools/ont-dorado-server/bin
Resuming basecall from previous logfile: pod5_mixed_dorado_server/ont_basecall_client_log-2024-03-06_14-21-29.log
Found 0 input read files to process.
Init time: 87 ms
[PROG_STAT_HDR] time elapsed(secs), time remaining (estimate), total reads processed, total reads (estimate), interval(secs), interval reads processed, interval bases processed
[PROG_STAT] 0.100182, 0, 0, 0, 0.100182, 0, 0
Caller time: 100 ms, Samples called: 0, samples/s: 0
Finishing up any open output files.
Basecalling completed successfully.

Read splitting analysis I also think there may be an issue with the read splitting. With respect to the read splitting phenomenon, if I take all the pass reads, cat them, and then do a read count it is almost double of dorado standalone's:

(base) billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test/pod5_mixed_dorado_server$ find ./pass/ -name "*.bam" > bam_list.txt
(base) billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test/pod5_mixed_dorado_server$ /mnt/ix1/Resources/tools/samtools/v1.15.1/samtools-1.15.1/samtools cat -b bam_list.txt -o merged.bam
(base) billylau@suzuki:/mnt/ix1/Projects_lite/20240305_BL_dorado_demux_test/pod5_mixed_dorado_server$ /mnt/ix1/Resources/tools/samtools/v1.15.1/samtools-1.15.1/samtools flagstat merged.bam 
8761 + 0 in total (QC-passed reads + QC-failed reads)
7666 + 0 primary
909 + 0 secondary
186 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
4324 + 0 mapped (49.36% : N/A)
3229 + 0 primary mapped (42.12% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

The raw number of mapped reads is about the same, which means there's a lot of extra strange reads here (vs. dorado-standalone which had 90%+ alignment). It makes me wonder whether:

billytcl commented 6 months ago

One more thing: to me it's weird that ont-dorado-server and dorado-standalone are giving such different split read metrics given that both of them are using sup mode. Is the split implementation different between the two? In ont-dorado-server I also use --max_read_split_depth 4 in case there are a ton of split points but I don't think this should really affect things that much.

tijyojwad commented 6 months ago

Thanks for confirming the initial hypothesis!

I need to look more into why the splitting is showing such different results. I'll get back to you with what I find.

billytcl commented 5 months ago

Just saw that there is now a dorado 0.6. I see that the duplicate reads is now fixed -- how about the differences in read splitting?

tijyojwad commented 5 months ago

Hi @billytcl - I haven't had time to dig into that yet, but in next basecall server release it will use the same splitting algorithm as dorado so they will get unified