seqan / seqan3

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.
https://www.seqan.de
Other
411 stars 82 forks source link

Unsupported SAM header tag in @HD: pb #3215

Closed notestaff closed 8 months ago

notestaff commented 12 months ago

Platform

Question

Is there a way to suppress the message Unsupported SAM header tag in @HD: pb when reading PacBio bam files? Relatedly, the documentation says https://github.com/seqan/seqan3/blob/57b9924a93b74a44319f8f9e93fa4292559c91aa/include/seqan3/io/sam_file/detail/format_sam_base.hpp#L272, but that's no longer the case, right? (And the current behavior of not throwing an error seems correct.)

eseiler commented 12 months ago

Hey @notestaff,

once again, thank you for finding flaws in our documentation!

It should definitely not say that it throws. It's probably a leftover from when we amended the alignment file parsing to be a bit less strict.

Do you have an example SAM/BAM file? Just the header section and one record/entry should be enough. We can also just add the pb header to a file ourselves, but it's better to have an example of what you are using.

In general, pb is not part of the SAM-specification's @HD tag, so we emit a warning.

What would be a solution you are looking for? It would be possible to redirect std:cerr while parsing the file, such that the warning is not emitted.

Some option to ignore/warn/error would also be possible, but would require code restructuring.

notestaff commented 9 months ago

The first header line reads @HD VN:1.6 SO:unknown pb:5.0.0 What would be useful is an option to suppress warnings about unsupported SAM header tag. Simplest might be to make this a compilation option, e.g. if SEQAN3_SUPPRESS_UNSUPPORTED_HEADER_TAG_WARNING is defined to 1 then don't print the warning. Or this could be part of the configuration options when opening a SAM file. Best would of course be to have a way to specify additional recognized tags. Thanks!

eseiler commented 8 months ago

It's now possible to ignore the warnings: https://docs.seqan.de/seqan3/main_user/structseqan3_1_1sam__file__input__options.html#a0ddbdbedd6183dc583d9195917a6ef26

notestaff commented 8 months ago

Great, thanks a lot! Just in time for external release of our seqan3-based tool :)

notestaff commented 8 months ago

But, this fix suppresses ALL warnings -- a way to suppress specific ones would be better.

Also, the SAM spec says "you can freely add new tags for further data fields. Tags containing lowercase letters are reserved for local use". So I'm not sure this should be a warning at all, at least by default, especially for lowercase tags like in our example.

eseiler commented 8 months ago

But, this fix suppresses ALL warnings -- a way to suppress specific ones would be better.

It's currently the only warning we emit ("Unsupported SAM header tag in [...]").

It's also a bit more flexible than just on/off in that you can postprocess the warning, e.g.,

void filter()
{
    auto fin = get_sam_file_input();
    std::ostringstream stream{};
    fin.options.stream_warnings_to = std::addressof(stream);
    auto it = fin.begin();
    for (auto && warning : stream.view() | std::views::split('\n'))
    {
        if (std::string_view sv{warning.data(), warning.size()}; !sv.empty() && !sv.ends_with("pb"))
            std::cerr << sv << '\n';
    }
}

Would be nicer to overload operator<<, but then we would need either some customisation point object...

Also, the SAM spec says "you can freely add new tags for further data fields. Tags containing lowercase letters are reserved for local use". So I'm not sure this should be a warning at all, at least by default, especially for lowercase tags like in our example.

I did miss this one sentence in the spec multiple times :) Before this was a warning, it was an exception. I'm not sure if the warning is meant as "it's not valid, but we just continue" or "we don't support that" (in contrast to alignment tags).