liguowang / CrossMap

CrossMap is a python program to lift over genome coordinates from one genome version to another.
https://crossmap.readthedocs.io/en/latest/
Other
75 stars 24 forks source link

Coordinate Conversion Before/After Variant Calling? #78

Open DayTimeMouse opened 1 month ago

DayTimeMouse commented 1 month ago

Hi,

Thanks for developing this nice tool.

I have two genomes, assembly1 and assembly2. My goal is to use these two genomes as reference genomes, align reads to each of them, and call variants. To effectively compare the variants obtained from the two genomes and identify their similarities and differences, I am considering two approaches:

Coordinate Conversion Before Variant Calling: Should I first convert the coordinates between the two genome assemblies before calling variants, and then perform the comparison? Coordinate Conversion After Variant Calling: Alternatively, should I proceed without converting the genome assembly coordinates initially, call variants separately for each genome, and then convert the coordinates of the resulting VCF files before performing the comparison?

Which of these methods would be more reasonable?

When using the second method (coordinate conversion after variant calling), I have encountered an issue where many structural variations (SVs) are reported as unmap. Could you please provide some advice on how to address this problem?

2024-10-18 08:46:32 [INFO]  Keep variants [reference_allele == alternative_allele] ...
2024-10-18 08:46:32 [INFO]  Updating contig field ...
2024-10-18 08:46:32 [INFO]  Lifting over ...
2024-10-18 08:46:33 [INFO]  Total entries: 1513
2024-10-18 08:46:33 [INFO]  Failed to map: 643

Best regards.

liguowang commented 1 month ago

Hi I think the 2nd approach "Coordinate Conversion After Variant Calling" is more reasonable. Regarding the SVs that failed to convert, did you try the "CrossMap region" command? This command allows fuzzy converts (i.e., regions from the input assembly do NOT have to be 100% mapped to the target).

Please keep in mind that the conversion ratio is largely determined by the chain file.

Liguo

On Thu, Oct 17, 2024 at 9:31 PM DayTimeMouse @.***> wrote:

Hi,

Thanks for developing this nice tool.

I have two genomes, assembly1 and assembly2. My goal is to use these two genomes as reference genomes, align reads to each of them, and call variants. To effectively compare the variants obtained from the two genomes and identify their similarities and differences, I am considering two approaches:

Coordinate Conversion Before Variant Calling: Should I first convert the coordinates between the two genome assemblies before calling variants, and then perform the comparison? Coordinate Conversion After Variant Calling: Alternatively, should I proceed without converting the genome assembly coordinates initially, call variants separately for each genome, and then convert the coordinates of the resulting VCF files before performing the comparison? Which of these methods would be more reasonable?

When using the second method (coordinate conversion after variant calling), I have encountered an issue where many structural variations (SVs) are reported as unmap. Could you please provide some advice on how to address this problem?

Best regards.

— Reply to this email directly, view it on GitHub https://github.com/liguowang/CrossMap/issues/78, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACN443UQ3DVDSQD3N5CGQCDZ4BXHDAVCNFSM6AAAAABQE7HQ56VHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4TMMRTGEZTMMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

DayTimeMouse commented 3 weeks ago

Hi liguowang,

I used CrossMap vcf to convert genome coordinate, but REF base is changed, like ID.2 original is G, after converting is C, there are many cases like this, is it right?

original:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr1    414779  ID.2    G   A   15.5    PASS    .   GT:GQ:DP:AD:VAF:PL  1/1:12:43:28,15:0.348837:15,15,0
chr1    416197  ID.3    A   C   25.3    PASS    .   GT:GQ:DP:AD:VAF:PL  1/1:24:45:28,17:0.377778:25,32,0
chr1    895954  ID.4    G   T   17.1    PASS    .   GT:GQ:DP:AD:VAF:PL  1/1:6:41:24,17:0.414634:15,5,0
chr1    946700  ID.5    G   T   28  PASS    .   GT:GQ:DP:AD:VAF:PL  1/1:27:51:27,24:0.470588:28,34,0
chr1    1069530 ID.6    C   T   36.1    PASS    .   GT:GQ:DP:AD:VAF:PL  1/1:34:55:32,23:0.418182:36,38,0
chr1    1343590 ID.7    G   A   11.5    PASS    .   GT:GQ:DP:AD:VAF:PL  1/1:4:40:19,21:0.525:9,3,0
chr1    1484250 ID.8    C   A   16.1    PASS    .   GT:GQ:DP:AD:VAF:PL  1/1:10:49:28,21:0.428571:15,11,0

after converting:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr1    2216210 ID.2    C   A   15.5    PASS    .   GT:GQ:DP:AD:VAF:PL  1/1:12:43:28,15:0.348837:15,15,0
chr1    2217628 ID.3    C   C   25.3    PASS    .   GT:GQ:DP:AD:VAF:PL  1/1:24:45:28,17:0.377778:25,32,0
chr1    2674841 ID.4    A   T   17.1    PASS    .   GT:GQ:DP:AD:VAF:PL  1/1:6:41:24,17:0.414634:15,5,0
chr1    2725591 ID.5    C   T   28  PASS    .   GT:GQ:DP:AD:VAF:PL  1/1:27:51:27,24:0.470588:28,34,0
chr1    2848747 ID.6    T   T   36.1    PASS    .   GT:GQ:DP:AD:VAF:PL  1/1:34:55:32,23:0.418182:36,38,0
chr1    3122698 ID.7    A   A   11.5    PASS    .   GT:GQ:DP:AD:VAF:PL  1/1:4:40:19,21:0.525:9,3,0
chr1    3263515 ID.8    T   A   16.1    PASS    .   GT:GQ:DP:AD:VAF:PL

Best regards.