phac-nml / biohansel

Rapidly subtype microbial genomes using single-nucleotide variant (SNV) subtyping schemes
Apache License 2.0
25 stars 7 forks source link

Output list of tiles that didn't match #59

Open dankein opened 5 years ago

dankein commented 5 years ago

Would it be possible to generate an additional output for troubleshooting that is a list of tiles where there were no positive or negative matches for each sample?

glabbe commented 5 years ago

That is a good suggestion!

mgopez commented 5 years ago

Sounds like a great idea, I'll have a look and see how we can implement this into the tool. Is that okay with you @peterk87?

peterk87 commented 5 years ago

Hi @dankein

What do you think of a matrix (tiles by samples) of tile absence/presence?

Something like

Samples tile 1 tile 2 tile 3 ... tile X
sample 1 1 1 0 ... 1
sample 2 0 1 0 ... 0
sample 3 1 0 1 ... 1

Presence/absence could be represented with 1/0 or true/false or something else.

dankein commented 5 years ago

Hi @peterk87, good idea, I like this if it can accommodate both positive and negative tiles. Positive and negative represented separately, or with a match to either a positive or negative represented as a 1 would work. I think this would make troubleshooting / QCing a new tile set easier in my view.

My original thought was a bit simpler - a mirror of the match_results- maybe just (file_path, ref position and sample columns) for the tiles that didn't match.