liguowang / CrossMap

CrossMap is a python program to lift over genome coordinates from one genome version to another.
https://crossmap.readthedocs.io/en/latest/
Other
74 stars 24 forks source link

Coordinate Conversion Before/After Variant Calling? #78

Open DayTimeMouse opened 1 week ago

DayTimeMouse commented 1 week ago

Hi,

Thanks for developing this nice tool.

I have two genomes, assembly1 and assembly2. My goal is to use these two genomes as reference genomes, align reads to each of them, and call variants. To effectively compare the variants obtained from the two genomes and identify their similarities and differences, I am considering two approaches:

Coordinate Conversion Before Variant Calling: Should I first convert the coordinates between the two genome assemblies before calling variants, and then perform the comparison? Coordinate Conversion After Variant Calling: Alternatively, should I proceed without converting the genome assembly coordinates initially, call variants separately for each genome, and then convert the coordinates of the resulting VCF files before performing the comparison?

Which of these methods would be more reasonable?

When using the second method (coordinate conversion after variant calling), I have encountered an issue where many structural variations (SVs) are reported as unmap. Could you please provide some advice on how to address this problem?

2024-10-18 08:46:32 [INFO]  Keep variants [reference_allele == alternative_allele] ...
2024-10-18 08:46:32 [INFO]  Updating contig field ...
2024-10-18 08:46:32 [INFO]  Lifting over ...
2024-10-18 08:46:33 [INFO]  Total entries: 1513
2024-10-18 08:46:33 [INFO]  Failed to map: 643

Best regards.

liguowang commented 1 week ago

Hi I think the 2nd approach "Coordinate Conversion After Variant Calling" is more reasonable. Regarding the SVs that failed to convert, did you try the "CrossMap region" command? This command allows fuzzy converts (i.e., regions from the input assembly do NOT have to be 100% mapped to the target).

Please keep in mind that the conversion ratio is largely determined by the chain file.

Liguo

On Thu, Oct 17, 2024 at 9:31 PM DayTimeMouse @.***> wrote:

Hi,

Thanks for developing this nice tool.

I have two genomes, assembly1 and assembly2. My goal is to use these two genomes as reference genomes, align reads to each of them, and call variants. To effectively compare the variants obtained from the two genomes and identify their similarities and differences, I am considering two approaches:

Coordinate Conversion Before Variant Calling: Should I first convert the coordinates between the two genome assemblies before calling variants, and then perform the comparison? Coordinate Conversion After Variant Calling: Alternatively, should I proceed without converting the genome assembly coordinates initially, call variants separately for each genome, and then convert the coordinates of the resulting VCF files before performing the comparison? Which of these methods would be more reasonable?

When using the second method (coordinate conversion after variant calling), I have encountered an issue where many structural variations (SVs) are reported as unmap. Could you please provide some advice on how to address this problem?

Best regards.

— Reply to this email directly, view it on GitHub https://github.com/liguowang/CrossMap/issues/78, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACN443UQ3DVDSQD3N5CGQCDZ4BXHDAVCNFSM6AAAAABQE7HQ56VHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4TMMRTGEZTMMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>