graph-genome / MatrixTubeMap

Using a Matrix to show coverage of multiple genomic sequences using a prototype hack of Sequence Tubemap
MIT License
7 stars 0 forks source link

Arabidopsis 1001G data in MatrixTubemap #13

Closed josiahseaman closed 4 years ago

josiahseaman commented 4 years ago
ekg commented 4 years ago

Here's how to take a GFA file and make the tubmap bin input:

odgi build -g x.gfa -o - | odgi sort -i - -o x.og
odgi bin -i x.og -w $bin_width -j | gzip >x.og.$bin_width.json.gz
josiahseaman commented 4 years ago

Actual commands used:

cmake -H. -Bbuild && cmake --build build -- -j 3

./odgi build -g sixref_Chr4.gfa -o - | ./odgi sort -i - -o sixref_Chr4.og
bin_width=100000
./odgi bin -i sixref_Chr4.og -w $bin_width -j > sixref_Chr4.og.$bin_width.json.gz
bin_width=10000
./odgi bin -i sixref_Chr4.og -w $bin_width -j  >sixref_Chr4.og.$bin_width.json
bin_width=1000
./odgi bin -i sixref_Chr4.og -w $bin_width -j > sixref_Chr4.og.$bin_width.json
bin_width=100
./odgi bin -i sixref_Chr4.og -w $bin_width -j > sixref_Chr4.og.$bin_width.json

node.exe cli.js -j data\sixref_Chr4.og.10000.json --bin_length=10000 --tiles=1 --tile_json=data\sixref_Chr4.og.10000.tile.json
node.exe cli.js -j data\sixref_Chr4.og.1000.json --bin_length=1000 --tiles=1 --tile_json=data\sixref_Chr4.og.1000.tile.json
node.exe cli.js -j data\sixref_Chr4.og.100.json --bin_length=100 --tiles=1 --tile_json=data\sixref_Chr4.og.100.tile.json
josiahseaman commented 4 years ago

image Real Arabidopsis data is working! I will definitely need #14 for this data, because it's very detailed and doesn't all fit into view easily.

image Bin size = 100. translate(0,0) scale(0.0038, 2)

image This is the entire Chr4 bins at bin size 1k. transform="translate(0,0) scale(0.038,1.9)"

image Bin size 10k.

image Interestingly, file size between the formats doesn't scale linearly. This tells me that at small bin sizes, (100.json) there are long runs of contiguous ids covered.

josiahseaman commented 4 years ago

Currently waiting on Sebastian for more data.