At the moment, the read group string is set by the "run" column. This means that the same library sequenced on multiple flow cells gets treated separately when duplicates are marked, even though duplicate marking happens after merging reads. The correct usage of the read group would set a library string and then the full RG string would be the same value for one library sequenced across multiple lanes. This should lead to GATK and Sentieon both marking duplicates according to the LibraryName column, while retaining the utility of the run column processing each run separately before bam merging.
At the moment, the read group string is set by the "run" column. This means that the same library sequenced on multiple flow cells gets treated separately when duplicates are marked, even though duplicate marking happens after merging reads. The correct usage of the read group would set a library string and then the full RG string would be the same value for one library sequenced across multiple lanes. This should lead to GATK and Sentieon both marking duplicates according to the LibraryName column, while retaining the utility of the run column processing each run separately before bam merging.