vgteam / sequenceTubeMap

displays multiple genomic sequences in the form of a tube map
MIT License
177 stars 24 forks source link

Add Visualization Examples #412

Closed ducku closed 4 months ago

ducku commented 5 months ago

I included some chunks created from prepare_local_chunks. However, it seems that some haplotypes are added that didn't exist in the original. Not too sure what causes this. I added the original files in a visualization_exmaples folder.

Here are some pictures: chr7_124051614_124054114HCC1395_ref0_500-1500: ![chr7_124051614_124054114HCC1395_ref0_500-1500](https://github.com/vgteam/sequenceTubeMap/assets/23585956/5158b757-3e8a-4399-b3b3-a36681a7c1ee)

chr1_86645908_86646408_chr1_1-500: chr1_86645908_86646408_chr1_1-500

chr5_73149742_73150242_ch5-1-500: chr5_73149742_73150242_ch5-1-500

adamnovak commented 4 months ago

I think the extra haplotypes are coming in because, if you view the GBZ with the tube map it runs vg chunk on it and deduplicates locally identical haplotype paths. But if you use prepare_local_chunk.sh (and not prepare_chunks.sh) it runs it through vg convert and all the haplotypes in the graph become their own paths, even if they are locally identical.

adamnovak commented 4 months ago

I'm re-pulling the chunks by running:

cd exampleData

rm -Rf lancet_chunk_example_*
rm -f lancet_chunk_examples.bed

../scripts/prepare_chunks.sh -x visualization_examples/chr1_86645908_86646408.giraffe.gbz -r chr1:1-500 -g visualization_examples/normal.chr1_86645908_86646408.sorted.gam -d "Example STM_DataShare_Nov07_2023 chr1_86645908_86646408" -o lancet_chunk_example_1 | sed 's/chr1\t1\t500/chr1\t86645908\t86646407/g' >>lancet_chunk_examples.bed
cat lancet_chunk_example_1/regions.tsv | awk '{$2 += 86645908 ; $3 += 86645908 ; print }' > lancet_chunk_example_1/regions.tsv.new
mv lancet_chunk_example_1/regions.tsv.new lancet_chunk_example_1/regions.tsv

../scripts/prepare_chunks.sh -x visualization_examples/chr5_73149742_73150242.giraffe.gbz -r chr5:1-500 -g visualization_examples/normal.chr5_73149742_73150242.sorted.gam -d "Example STM_DataShare_Nov07_2023 chr5_73149742_73150242" -o lancet_chunk_example_2 | sed 's/chr5\t1\t500/chr5\t73149742\t73150241/g' >>lancet_chunk_examples.bed
cat lancet_chunk_example_2/regions.tsv | awk '{$2 += 73149742 ; $3 += 73149742 ; print }' > lancet_chunk_example_2/regions.tsv.new
mv lancet_chunk_example_2/regions.tsv.new lancet_chunk_example_2/regions.tsv

../scripts/prepare_chunks.sh -x visualization_examples/chr7_124051614_124054114__HCC1395.giraffe.gbz -r ref0:1201-1300 -g visualization_examples/Tumor_HCC1395.chr7_124051614_124054114.sorted.gam -p '{"mainPalette": "reds", "auxPalette": "reds"}' -g visualization_examples/Normal_HCC1395.chr7_124051614_124054114.sorted.gam -p '{"mainPalette": "blues", "auxPalette": "blues"}' -d "Example Insertion stm_gfa_test_data_jul21_2023 chr7_124051614_124054114__HCC1395" -o lancet_chunk_example_3 | sed 's/ref0\t1201\t1300/ref0\t124052814\t124052914/g' >>lancet_chunk_examples.bed
cat lancet_chunk_example_3/regions.tsv | awk '{$2 += 124051614 ; $3 += 124051614 ; print }' > lancet_chunk_example_3/regions.tsv.new
mv lancet_chunk_example_3/regions.tsv.new lancet_chunk_example_3/regions.tsv

This I think gets us the right coordinates in the genome displayed, given the apparent start coordinates of the subgraphs being extracted from. It also gets us about the result of actually loading up the source subgraphs and requesting the specified regions.