pangenome / pggb

the pangenome graph builder
https://doi.org/10.1038/s41592-024-02430-3
MIT License
368 stars 41 forks source link

wfmash -Y option #253

Closed SAMtoBAM closed 1 year ago

SAMtoBAM commented 1 year ago

Hi there;

Just a simple question;

Is the use of the -Y option for wfmash encouraged?

-Y, --exclude-delim C       skip mappings between sequences with the same name prefix before
                            the given delimiter character [default: all-vs-all and !self]

It seems like a simple solution for avoiding within genome/sample/strain mapping rather than just avoiding self-mapping (the default) which, if I have understand correctly, only considers a 'self' as each path, which is essentially each contig? Is there something I am missing?

Thanks!

AndreaGuarracino commented 1 year ago

Yes, your interpretation of self-mapping and the Y behavior is correct.

For example, if you follow the PanSN-spec and have sequences named HG002#1#ctg1 HG002#1#ctg2 -Y '#' means 'skip mappings between sequences of the same haplotype', so skip mappings between HG002#1#ctg1 and HG002#1#ctg2.

We do not encourage its use, it always depends on the type of analysis and which kind of mappings you would like to have and analyze.

SAMtoBAM commented 1 year ago

thanks for the response I guess where I thought its use would be encouraged is in the same places where the number of haplotypes for -n is suggested to be the true number. Is there something different about the two approaches in terms of desired outcomes?

AndreaGuarracino commented 1 year ago

Actually, I use -Y often. However, I've never thoroughly examined the practical (non-theoretical) differences in whether or not to specify it. @ekg, are there any reasons I can't think of right now as to why we don't push the use of -Y?

AndreaGuarracino commented 1 year ago

Never later than never. With this commit, now -Y "#" is used by default!