Arabidopsis 1001G data in MatrixTubemap

josiahseaman commented 4 years ago

[x] Identify data source: six_ref.GFA
[x] Install / Compile ODGI
[x] Process through ODGI
[x] Process through Tubeify
[x] Import into tubemap.js
[x] Rendered and browsing (performant)
[ ] Links verified correct
[x] reasonable size 5 MB (larger binning for whole?)
[x] Sebastian's data converted
[ ] Sebastian's data visualized

ekg commented 4 years ago

Here's how to take a GFA file and make the tubmap bin input:

odgi build -g x.gfa -o - | odgi sort -i - -o x.og
odgi bin -i x.og -w $bin_width -j | gzip >x.og.$bin_width.json.gz

josiahseaman commented 4 years ago

Actual commands used:

cmake -H. -Bbuild && cmake --build build -- -j 3

./odgi build -g sixref_Chr4.gfa -o - | ./odgi sort -i - -o sixref_Chr4.og
bin_width=100000
./odgi bin -i sixref_Chr4.og -w $bin_width -j > sixref_Chr4.og.$bin_width.json.gz
bin_width=10000
./odgi bin -i sixref_Chr4.og -w $bin_width -j  >sixref_Chr4.og.$bin_width.json
bin_width=1000
./odgi bin -i sixref_Chr4.og -w $bin_width -j > sixref_Chr4.og.$bin_width.json
bin_width=100
./odgi bin -i sixref_Chr4.og -w $bin_width -j > sixref_Chr4.og.$bin_width.json

node.exe cli.js -j data\sixref_Chr4.og.10000.json --bin_length=10000 --tiles=1 --tile_json=data\sixref_Chr4.og.10000.tile.json
node.exe cli.js -j data\sixref_Chr4.og.1000.json --bin_length=1000 --tiles=1 --tile_json=data\sixref_Chr4.og.1000.tile.json
node.exe cli.js -j data\sixref_Chr4.og.100.json --bin_length=100 --tiles=1 --tile_json=data\sixref_Chr4.og.100.tile.json

josiahseaman commented 4 years ago

Real Arabidopsis data is working! I will definitely need #14 for this data, because it's very detailed and doesn't all fit into view easily.

Bin size = 100. translate(0,0) scale(0.0038, 2)

This is the entire Chr4 bins at bin size 1k. transform="translate(0,0) scale(0.038,1.9)"

Bin size 10k.

Interestingly, file size between the formats doesn't scale linearly. This tells me that at small bin sizes, (100.json) there are long runs of contiguous ids covered.

josiahseaman commented 4 years ago

Currently waiting on Sebastian for more data.

graph-genome / MatrixTubeMap

Arabidopsis 1001G data in MatrixTubemap #13