graph-genome / component_segmentation

Read in ODGI Bin output and identify co-linear components
Apache License 2.0
3 stars 4 forks source link

v12: Zooming: separate directory for each bin_width #12

Closed josiahseaman closed 4 years ago

josiahseaman commented 4 years ago

Assigned to: Thomas Townsley.

This is a slice of (https://github.com/graph-genome/Schematize/issues/33).

Detail: Each zoom level should get it's own subdirectory inside the "graph_name" subdirectory and bin2file.json will be sitting in "graph_name" with an index to all the files. I'd say sequence only needs to be listed once for bin_width=1, so it can sit next to bin2file.json and only the width=1 zoom layer need reference "fasta_file" entries.

josiahseaman commented 4 years ago

I just added w1, w10, and w100 for b1phi1 dataset to the repo. So we now have a full zoom stack to test against. These files should be batch processed based on their name similarity and each get their own output subfolder. The will all be referenced by a single bin2file.json in their core b1phi1 directory (figure out the naming). Full example format is listed in #16 v12.

josiahseaman commented 4 years ago

Side note: We may eventually want to switch to powers of 4 or powers of 2 as our zoom stack step size. 10x steps are just easy to type at the moment. This is part of the impetus for batching, the big genomes could end up with many zoom layers, not just 3.

mandosoft commented 4 years ago

Working this one!

josiahseaman commented 4 years ago

Input: user provides directory to look in and file prefix to identify all files from the same graph genome.

/path/data/run.B1phi1_w1.json
/path/data/run.B1phi1_w2.json
/path/data/run.B1phi1_w4.json
/path/data/run.B1phi1_w8.json

Should be globbed from /path/data/run.B1phi1*