magicDGS / ReadTools

A Universal Toolkit for Handling Sequence Data from Different Sequencing Platforms
https://magicdgs.github.io/ReadTools/
MIT License
6 stars 3 forks source link

Default barcode separators should be treated separately for split #525

Closed magicDGS closed 5 years ago

magicDGS commented 5 years ago

After fixing @robmaz issue for the non-default + separator by using a regexp in the java property, I found while testing today that there is a problem with the output (running the tool for assigning RG based on barcodes): joined barcodes are in the form CCCCCCC\+CCCCCCC, making unable to re-use the same java property to split them (now, it will require \\+ instead). This is because the java-property is used for both split and join the barcodes. We should find a solution for this.

robmaz commented 5 years ago

Maybe always normalize the BC to the sam-recommended hyphenated form? Btw, the SAM specification from May finally allows the BC tag in the @RG line. Would there be any advantage to using this instead of the optional BC:Z: tag in each read?

Cheers Rupert

Am Do., 30. Aug. 2018 um 13:24 Uhr schrieb Daniel Gómez-Sánchez < notifications@github.com>:

After fixing @robmaz https://github.com/robmaz issue for the non-default + separator by using a regexp in the java property, I found while testing today that there is a problem with the output (running the tool for assigning RG based on barcodes): joined barcodes are in the form CCCCCCC+CCCCCCC, making unable to re-use the same java property to split them (now, it will require \+ instead). This is because the java-property is used for both split and join the barcodes. We should find a solution for this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/magicDGS/ReadTools/issues/525, or mute the thread https://github.com/notifications/unsubscribe-auth/Ad_FfM8Yf-ytCHhH6y9DsXAWSgSBRij7ks5uV8tagaJpZM4WTSMh .

magicDGS commented 5 years ago

I've already a PR to fix this (#526) and I am planning to do a patch release for you today. As the SAM-recommended hyphen form is just a recommendation, the java-property is to change that for all the framework (e.g., if the lab has some standard for the barcode to be separated by +, adding the java property everywhere is the way to convert the recommendation to their own implementation - that's why is a java-property and not an argument). For release 2, it is in my roadmap to add an argument to modify the input barcode-separator(s) but normalize to the ReadTools internal recommended hyphen; but that should wait a bit, as I did not have time to design properly the features for the version 2 (which will be backwards incompatible).

The issue for the @RG barcode is a different one (actually, I did the change in the specs to include it). But the change in ReadTools might take longer, as it will be an important refactoring for the barcode handling code.