tanghaibao / jcvi

Python library to facilitate genome assembly, annotation, and comparative genomics
BSD 2-Clause "Simplified" License
745 stars 187 forks source link

BioNano data revisited. #37

Closed JohnUrban closed 3 years ago

JohnUrban commented 7 years ago

Hi,

This tool seems to be capable of great feats if one has the right data (someone recently showed me some numbers). I do not have genetic map information, but I do have optical map information from BioNano. I might soon have more than one type of optical map (e.g. from different enzymes). I saw in #8, which was open and closed in March, someone already asked about bionano data and the response was that it could be possible to use with AllMaps, but was not officially supported. I was wondering if there has been any more exploration of using BioNano data with AllMaps by the developers or any users who might be reading this, and if so, how did it go?

best,

John

tanghaibao commented 7 years ago

@JohnUrban you are not alone in that regard - I was asked a few times. The most important step is to convert the maps (.cmap and .xmap) to the .bed format which translate coordinates on the scaffolds to the coordinates on the map. I don't have an assembly project that has corresponding BioNano data yet but would like to guide any one who wishes to work on this.

wyim-pgl commented 7 years ago

There's third party program converting xmap to bed.

I think you can give a try but you need to round the xmap coordinate.

JohnUrban commented 7 years ago

Thanks for getting back to me. I will look into this, give it a try, and would be happy to share my experience. I will likely get to this in January or shortly thereafter.

tangerzhang commented 7 years ago

Hi @JohnUrban I have a script which might be helpful. https://github.com/tangerzhang/my_script/blob/master/bionano2Allmaps.pl This script is dependent on Irys-Scaffolding (https://github.com/i5K-KINBRE-script-share/Irys-scaffolding). Additionally, as @ascendo mentioned, you might need modify the script from line 55 to 60 to round xmap coordinates.

wyim-pgl commented 7 years ago

Oh what a nice @JohnUrban.

Do you have any dual enzyme version?

Regards,

tangerzhang commented 7 years ago

Just modified my script and it should support any dual enzymes. Separate the two enzymes by comma. For example: perl bionano2allmaps.pl -x xmap -i genome.fasta -e BspQI,BbvCI

wyim-pgl commented 7 years ago

Very helpful. Thanks.

JohnUrban commented 7 years ago

Thanks for all the tips everyone. As research sometimes goes, I have not yet got around to trying AllMaps with the BioNano data. It is something that I will definitely do in the future, but am unsure as to when at the moment. Nonetheless, since 4 months or so has passed since I posted this, I wonder if any of you have made any progress with BioNano + AllMaps? Any procedural advice starting with the raw optical maps?

wyim-pgl commented 7 years ago

No I don't

splaisan commented 7 years ago

Hi @tangerzhang, I tried your script for single color and it failed for me because the xmap-ID's are not always part of the list of cmapIDs present in the key file (see row5 below).

Scaffold ID,scaffold position,LG,genetic position
000000F_018_pilon,6205.0,15,6205.0
000000F_018_pilon,625774.0,15,625774.0
000060F_pilon,7993.0,34,7993.0
000060F_pilon,1182657.0,34,1182657.0
,47489.0,34,1229903.3
000133F_001_pilon,496741.0,34,1514727.3
000133F_001_pilon,2385.0,34,2009083.3

This is because some of the Xmap records result from clipping and splitting of original contigs and have their own new and arbitrary numbering. This classification is a nightmare for me and I am desperately seeking a way to navigate through these integers. Did you use your code already on real data? Best, Stephane

PS: also, by adapting your code to dual colour, it does not work anymore for single because the fa2cmap_multi_color.pl script wants even number of enzymes as -e argument. It would have been better to keep both versions.

yeban commented 5 years ago

Hi,

When I try to convert the csv output of @tangerzhang's script to bed format I get an empty bed file. Steps below:

$ perl bionano2Allmaps.pl -x 2019-02-25-ref_to_ctg.xmap -i ref.fa -e BspQI

$ head bionano.map.csv
Scaffold ID,scaffold position,LG,genetic position
tig00000089,5853.0,1,43549.0
tig00000514,751.0,1,2724805.8
tig00021472,1716890.0,1,3026798.8
tig00021472,1772020.0,1,3084540.3
tig00000514,496686.0,1,3252202.4
tig00002993,21364.0,1,3282895.9
tig00021674,88638.0,1,3362164.4
tig00002993,157676.0,1,3430115.5
tig00021674,1014.0,1,3449498.2

$ python -m jcvi.assembly.allmaps merge bionano.map.csv -o bionano.bed
16:52:21 [allmaps] A total of 0 markers written to `bionano.bed`.
16:52:21 [allmaps] Weights file written to `weights.txt`.

Could this have something to with rounding the xmap coordinates as mentioned in the comments above? How can I approach it?

Thanks

yeban commented 5 years ago

@tanghaibao wrote:

The most important step is to convert the maps (.cmap and .xmap) to the .bed format which translate coordinates on the scaffolds to the coordinates on the map. I don't have an assembly project that has corresponding BioNano data yet but would like to guide any one who wishes to work on this.

I have a genome assembly, genetic maps, and bionano optical maps. I would like to use ALLMAPS to combine both the maps to superscaffold my assembly. Please, could you guide me on creating ALLMAPS compatible BED file from BioNano cmap and xmap files? My previous comment outlines what I have tried so far.

yeban commented 5 years ago

When I try to convert the csv output of @tangerzhang's script to bed format I get an empty bed file.

Modified the output of @tangerzhang's script so that the second column (scaffold position) is integer, and not decimal. ALLMAPS's merge command is now able to parse the CSV file and output BED.


Another quick, related question: I am using both genetic and optical maps to scaffold my assembly with ALLMAPS. Since the distance is specified in different units in both the maps, do I need to rescale one of the maps before running ALLMAPS or can ALLMAPS handle that by itself?

tanghaibao commented 5 years ago

@yeban

The default mode is using marker rank (order within the map), this behavior is specified with the --distance option in when you run python -m jcvi.assembly.allmaps path. So rescaling is not needed.

However, ideally, the number of markers in the maps should be comparable for ALLMAPS to work as expected. If one map has way more markers than the other it tends to dominate the order even if you have set the weight to be equal to start with.

Haibao

yeban commented 5 years ago

That's helpful. Thanks!

yeban commented 5 years ago

@tanghaibao

I have a quick question about weights concerning your comment above. In the genetic maps, I have approximately 13,000 markers, but only about 1000 in the Bionano optical maps. How would you set the weights so that the default rank mode works as expected? I set the weights for genetic and optical maps to 1 and 10, respectively, because optical maps have 10-times less marker. But does it even make sense?