Closed tsnorri closed 1 year ago
Here’s a suggestion:
program_info_t
outside sam_file_header
and rename it to e.g. sam_file_program_info
typedef sam_file_program_info program_info_t
to sam_file_header
, possibly with [[deprecated("Please use sam_file_program_info instead")]]
Hi @tsnorri,
thanks for reaching out and sorry for the delay.
You are absolutely right. This copy should be way easier. Let me investigate a little and then get back to you. Thanks in advance for volunteering to help!
This copy (for anyone following the thread):
#include <seqan3/io/sam_file/all.hpp>
int main()
{
seqan3::sam_file_input in{"in.sam"};
seqan3::sam_file_output out{"out.sam"};
auto it = in.begin(); // cache first record which also reads the header
out.header().program_infos = in.header().program_infos; // error :(
}
Hi @tsnorri,
after a quick dive into the code, we agree with your suggestion.
The tasks would be:
program_info_t
from the class scope to the outer scope (right above the seqan3::sam_file_head
in the same file) and rename it to sam_file_program_info_t
(within the seqan3 namespace)using program_info_t = sam_file_program_info_t;
within the seqan3::sam_file_header
class[ ] Add a unit test in test/unit/io/sam_file/sam_file_output_test.cpp
somehting like : (not tested)
TEST(header, copy_program_info_t)
{
std::string comp =
R"(@HD VN:1.6 SO:unknown GO:none
@SQ SN:ref LN:26
@PG ID:prog1 PN:cool_program
@CO This is a comment.
read1 41 ref 1 61 1S1M1D1M1I ref 10 300 ACGT !##$ AS:i:2 NM:i:7
)";
seqan3::sam_file_input fin{std::istringstream{comp}, seqan3::format_sam{}};
seqan3::sam_file_output fout{std::ostringstream{}, seqan3::format_sam{}};
auto it = in.begin(); // cache first record which also reads the header
fout.header().program_infos = fin.header().program_infos;
}
It would be great if you can tackle this. Otherwise just drop a note that you have no time and we can work on this.
Best, Svenja
Hi @smehringer and sorry about the delay! I finally had some time to fix the issue; the pull request above should have everything needed.
Best, Tuukka
Actually scratch that. I just realised that the test I wrote did not actually test the intended functionality. I’ll re-refine it.
I had noticed earlier but forgotten that the solution to this may not be the one that I first thought was the obvious one. If ref_ids_type
in the destination is ref_info_not_given
, then calling header()
will not work, which does make sense. Consequently the easiest way to append a program_info_t
is to copy the source header, modify it, and finally reset the header pointer in the records. I think that worked even without decoupling program_info_t
, though? Modifying the header object of the output file seemed to be the intended approach at first but I guess either is fine.
In any case, now program_info_t
does not depend on sam_file_header
’s template parameter and there is a test that checks that copying is possible.
When processing alignments in such a way that both the input and the output are SAM/BAM files, one would typically like to append a
@PG
record to the SAM file headers and, to that end, copy the existing@PG
records. Currently this is not very easy under some circumstances assam_file_header::program_info_t
depends onsam_file_header
’s template parameterref_ids_type
(i.e. when the input and output files have a different type forref_ids_type
).Suggested solutions include:
program_info_t
not depend onref_ids_type
by e.g. moving it outsidesam_file_header
program_info_t
.(I’m not sure what would be the best approach but once one is chosen, I could volunteer to contribute a patch.)