marbl / CHM13

The complete sequence of a human genome
Other
909 stars 98 forks source link

is it a good time for a reference genome switch? #76

Open seru71 opened 1 year ago

seru71 commented 1 year ago

Thank you for making this amazing set of resources available. It is not really an issue but a question about a strategic decision. If one was to start a project in cancer research using short-read deep WGS (small and structural variant calling) for a few thousand tumor-normal pairs, how smart is it to use CHM13 as a reference at this point of time?

Sticking to GRCh38 is definitely a safe choice, but not very progressive. With CHM13 as a reference, I am especially looking forward to improved SV accuracy, but also fewer small variant false-positives. I realize that it will take some time before many reference datasets will have CHM13-based releases, but if they eventually will, I am ok with liftover from GRCh38 in the meantime. What I would like to avoid is a need of remapping to CHMv2.1 or 3.0 at some point, or realizing that the human genomics world decided to skip CHM13, and jumps from GRCh38 right to a Pangenome reference.

Considering that you are probably the closest to the topic, I am wondering about your opinion. Are you aware of any such plans or will CHMv2.0 be THE human reference genome of choice in the upcoming years?

arangrhie commented 1 year ago

Hi @seru71, we have no plans to update or push another release for T2T-CHM13.

The Pangenome reference from the year 1 of HPRC is already available, including T2T-CHM13, and we expect to collect more haplotypes overtime. We are in the process of generating "diploid" T2T-HG002, which will be eventually included in the pangenome reference.

I'll let @aphillippy chime in.

seru71 commented 1 year ago

Hi @arangrhie Good to know that the current release is here to stay. I am wondering if we could also expect it becoming the successor of GRCh38 in the coming years

diekhans commented 1 year ago

Based on my knowledge, I seriously doubt that GRCh38 will be replaced with CHM13. There is still a large amount of work done on GRCh37 and the switch to GRCH38 was expesnive. Bread and butter bioinformatics will not be inclined to move.

Sophisticated analysis will move toward pangenomes. Right now, CHM13 is a great supplemented to GRCh38 as a step towards pangenomes.

all IMHO

Paweł Sztromwasser @.***> writes:

Hi @arangrhie Good to know that the current release is here to stay. I am wondering if we could also expect it becoming the successor of GRCh38 in the coming years

arangrhie commented 1 year ago

Yes, the T2T-CHM13 is not going to "replace" the GRCh38. They will co-exist. The GRC has indefinitely postponed GRCh39 (See the announcement in the yellow box).

seru71 commented 1 year ago

I assume in the same way GRCh37 and 38 co-exist in many databases. But one will be the 'default' choice for majority of human genome research and I have been wondering if in the coming years it is going to be GRCh38 or CHM13. Guessing from the GRC announcement their next reference is rather to be a pangenome, and not CHM13 alone, so for now sticking with GRCh38 seems most reasonable choice to me.

Thank you for your opinions.