maxrossi91 / moni

MONI: A Pangenomic Index for Finding MEMs
MIT License
37 stars 9 forks source link

Feature Request: add a flag to `moni mems` to get SAM format output #10

Closed AndreaGuarracino closed 1 month ago

AndreaGuarracino commented 2 months ago

Hi, would it be possible to add a flag to moni mems (something like -f/--format sam) to get the MEMs in SAM format? I see you've recently added

  -e, --extended-output
                        output MEM occurrence in the reference (default: False)

so it sounds like something doable.

maxrossi91 commented 1 month ago

This shouldn't be too difficult. However, I don't know when I have time to implement this though.

maxrossi91 commented 1 month ago

Well, thinking about it a little bit more, it is not that simple. What the output format would look like? You will have 1 row per MEM, marked as secondary alignment (except the first one) and a CIGAR with either a soft or hard clip on either side of the MEM, and the MEM reported as M.

As moni mem is structured right not this might be not as easy, since it would require to change the encoding of the intermediate files in the case of SAM output to carry over both the sequence and the base quality of the reads.

I need to think a little bit more about it.

AndreaGuarracino commented 1 month ago

Would the PAF format be easier?


From: Massimiliano Rossi @.> Sent: Wednesday, September 18, 2024 7:45:28 PM To: maxrossi91/moni @.> Cc: Andrea Guarracino @.>; Author @.> Subject: Re: [maxrossi91/moni] Feature Request: add a flag to moni mems to get SAM format output (Issue #10)

Well, thinking about it a little bit more, it is not that simple. What the output format would look like? You will have 1 row per MEM, marked as secondary alignment (except the first one) and a CIGAR with either a soft or hard clip on either side of the MEM, and the MEM reported as M.

As moni mem is structured right not this might be not as easy, since it would require to change the encoding of the intermediate files in the case of SAM output to carry over both the sequence and the base quality of the reads.

I need to think a little bit more about it.

— Reply to this email directly, view it on GitHubhttps://github.com/maxrossi91/moni/issues/10#issuecomment-2359687029, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO26XHUUWJVB4N6UD6DULY3ZXINCRAVCNFSM6AAAAABOOVTYQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJZGY4DOMBSHE. You are receiving this because you authored the thread.Message ID: @.***>

maxrossi91 commented 1 month ago

Yeah, maybe PAF would have been easier. Less cluttered output avoiding redundant SEQ and BASE QUALs.

I have implemented a quick feature. Please give it a try.

AndreaGuarracino commented 1 month ago

Thx a lot! Could you please make a pre-built binary? I still can't compile MONI.

About the PAF format, it would allow a very light output compared to SAM format output. We are interested in applying MONI on huge pangenomes, so a PAF output would be helpful. If I were able to compile the source, I would be able to work on that.


From: Massimiliano Rossi @.> Sent: Wednesday, September 18, 2024 9:33:12 PM To: maxrossi91/moni @.> Cc: Andrea Guarracino @.>; Author @.> Subject: Re: [maxrossi91/moni] Feature Request: add a flag to moni mems to get SAM format output (Issue #10)

Yeah, maybe PAF would have been easier. Less cluttered output avoiding redundant SEQ and BASE QUALs.

I have implemented a quick feature. Please give it a try.

— Reply to this email directly, view it on GitHubhttps://github.com/maxrossi91/moni/issues/10#issuecomment-2359857390, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO26XHSPUVEMYDI2GCRK52TZXIZWRAVCNFSM6AAAAABOOVTYQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJZHA2TOMZZGA. You are receiving this because you authored the thread.Message ID: @.***>

maxrossi91 commented 1 month ago

I made a full release and bumped the version to v0.2.2.

Re: compilation. I am able to compile using gcc 9.3.0, please try that and see if that works for you.

I agree PAF format might be lighter, and probably easier to implement. Unfortunately the moni mems code is quite messy and it might use some refactoring at this point, however the time is not much to dedicate to it on my spare time.

Happy to take PRs if you want to give it a try.

Closing this issue for now. Let me open another feature request for the PAF format.