hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
193 stars 59 forks source link

GRIPSS - ref_genome_version HG38 #301

Closed stsergbg closed 2 years ago

stsergbg commented 2 years ago

Hi!

Are there plans to add HG38 as a "ref_genome_version" option for GRIPSS? There is V37, V38 and HG19 for now, as far as I understand. I have many pipelines using hg38 reference with chrZZ notation. Also there is a hg38 PON file on HMF website that clearly uses chrZZ notation. Moreover, since PON files, known breakpoints and the actual genome fasta file are provided along with this option, it is not quite clear what this "ref_genome_version" option is used for.

Thank you in advance, Sergey

abelhj commented 2 years ago

I had this same problem yesterday. As far as I can tell GRIPSS is only using the reference version to determine the poly_g regions and the location of PSM2, so hopefully this is not a serious concern?

toddajohnson commented 2 years ago

I think that is already true, but just not explicitly stated in the README. The ref_genome_version argument to GRIPSS seems to be parsed through a common class used across the HMF programs (hmf-common/src/main/java/com/hartwig/hmftools/common/genome/refgenome/RefGenomeVersion.java). Lines 27-38 parse the allowed strings into V37, V38, or HG19 versions, but V38 will be assigned if the ref_genome_version is one of V38, RG_38, 38, or HG38 (so, seems don't use GRCh38 or hg38). V38 uses UCSD chromosome names (with chr prefix, chrM instead of MT), like HG19. In my experience, the problem that occurs is when one does not pass ref_genome_version, V37 is the default, and V37 does not have the chr prefix.

abelhj commented 2 years ago

With gripssv2.1, '38' does not work. 'V38' apparently does.

charlesshale commented 2 years ago

This is a bug - there is a second place in Gripss which expects only either V37 or V38. I'll fix this for v2.2, due out in about a week.

In general as Todd pointed out, all HMF tools accept '37' or 'V37' for GRCh37 and '38' or 'V38' for GRCh38.

charlesshale commented 2 years ago

Gripss v2.2 has been released now with this fix:

https://github.com/hartwigmedical/hmftools/releases/tag/gripss-v2.2

stsergbg commented 2 years ago

So gripss now can accept HG38 as ref_genome_version ? Could you please update the README if that's the case?

charlesshale commented 2 years ago

All HMF tools work with GRCh37 and GRCh38. The ref_genome_version config accepts the following strings for each: 37, V37 or HG37, or 38, V38 or HG38.

The Gripps READ-ME lists the options as V37 or V38, but you can set it as HG38 if you prefer.

charlesshale commented 2 years ago

Gripss does support GRCh38 / HG38 already - if you supply -ref_genome_version V38, it will use HG38.

On 11 Jun 2022, at 6:49 am, stsergbg @.***> wrote:

Hi!

Are there plans to add HG38 as a "ref_genome_version" option for GRIPSS? There is V37, V38 and HG19 for now, as far as I understand. I have many pipelines using hg38 reference with chrZZ notation. Also there is a hg38 PON file on HMF website that clearly uses chrZZ notation. Moreover, since PON files, known breakpoints and the actual genome fasta file are provided along with this option, it is not quite clear what this "ref_genome_version" option is used for.

Thank you in advance, Sergey

— Reply to this email directly, view it on GitHub https://github.com/hartwigmedical/hmftools/issues/301, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSYZZYTEOY4TEUMMSVRL5DVOOS4JANCNFSM5YO2QDEQ. You are receiving this because you are subscribed to this thread.