samtools / htslib

C library for high-throughput sequencing data formats
Other
783 stars 447 forks source link

bam_cigar_opchr - minor issue #1789

Closed kullrich closed 3 weeks ago

kullrich commented 3 weeks ago

Including htslib/sam.h causes some minor warning:

MISRA 5-0-11: The plain char type shall only be used for the storage and use of character values

Since the return type of bam_cigar_opchr is not explicitly specified, I could resolve this by:

#include "htslib/sam.h"
#undef bam_cigar_opchr
#define bam_cigar_opchr(c) (static_cast<char>(BAM_CIGAR_STR "??????" [bam_cigar_op(c)]))

Maybe you can change the corresponding line in sam.h to account for this.

Best regards

Kristian

kullrich commented 3 weeks ago

Related question is, if samtools is attempting to be MISRA compliant?

whitwham commented 3 weeks ago

Related question is, if samtools is attempting to be MISRA compliant?

No, samtools is not attempting to be MISRA compliant.

kullrich commented 3 weeks ago

OK, thanks. So I will close the issue.

jkbonfield commented 3 weeks ago

Since the return type of bam_cigar_opchr is not explicitly specified, I could resolve this by:

#include "htslib/sam.h"
#undef bam_cigar_opchr
#define bam_cigar_opchr(c) (static_cast<char>(BAM_CIGAR_STR "??????" [bam_cigar_op(c)]))

Samtools is also plain C rather than C++, although a caster could be done in a classic C fashion. As previously commented though, we target a standard C90 (or maybe C99) language standard.

jmarshall commented 2 weeks ago

The OP's original problem is similar to the case of MSVC: while HTSlib does not (currently) support being built by MSVC, it does try to avoid producing spurious warnings for MSVC projects that #include HTSlib's public headers.

Similarly HTSlib's public headers would ideally be usable from MISRAble projects without spurious warnings.

The OP has provided very little to go on, and it is hard to see how the proposed patch would address any actual misuse of char inside the bam_cigar_opchr() macro. To have any chance of getting this ameliorated, the OP would need to provide an example of user code using bam_cigar_opchr() that triggers the warning, and ideally mention what tools generate this warning — especially if they can be used by others to reproduce the problem.

kullrich commented 2 weeks ago

Hi @jmarshall,

I have just used JetBrains CLion software and activated the MISRA code inspection, that is it.

Since I was not aware that samtools is not aiming to be MISRA compliant.

I am sorry to have opened this issue at all.

Best regards

jkbonfield commented 2 weeks ago

What's the background of MISRA? Looking it up, it claims to be a Motor Industry standard, but I'm guessing it's jumped ship somewhere along the way if people are thinking that is appropriate for general software. I confess I'd never heard of it until this post.

Is it something that is enabled by default on many compilers? It doesn't appear to be supported natively by gcc or clang, but many of the non-default warnings look like they'd cover most of the rules, eg -Wcast-qual. However enabling just that one option adds over 1000 new warnings to htslib, of which ~600 appear to orignate in header files. It's possible that's a red-herring and not part of MISRA though. My google searches were a bit quick and dirty I'll admit.

If you do a "make -k" and count the total number of warnings with your MISRA compiler, how many is it complaining about? We don't have the resources to make huge changes like this, particularly at present with multiple other projects on the go that are competing for the dev team time.

Edit: of course, header warnings get duplicated many times over. It looks like 17 such unique warnings, 4 of them are in sam.h:

sam.h:1470:18: warning: cast discards 'const' qualifier from pointer target type [-Wcast-qual]
    r |= kputsn_((char*)key, 2, ks) < 0;

sam.h:1478:21: warning: cast discards 'const' qualifier from pointer target type [-Wcast-qual]
        r |= kputw(*(int8_t*)s, ks) < 0;

sam.h:1566:29: warning: cast discards 'const' qualifier from pointer target type [-Wcast-qual]
                r |= kputw(*(int8_t*)s, ks) < 0;

sam.h:1574:30: warning: cast discards 'const' qualifier from pointer target type [-Wcast-qual]
                r |= kputuw(*(uint8_t*)s, ks) < 0;
jmarshall commented 2 weeks ago

It's a highly opinionated coding standard with a long list of hilarious rules helpfully listed by JetBrains here. Generally I think a MISRA linter is something that you would pay for rather than find in GCC or Clang.

Thanks for the pointer to JetBrains's MISRA checker. I have been able to reproduce the message you saw.

Clearly trying to avoid MISRA diagnostics within HTSlib's headers is not realistic, but in JetBrains CLion for example the diagnostics are shown as annotations on individual source lines or tokens, so any arising within the headers would be hidden within them. Judging by your report, this one OTOH seems to be one of the few that leaks out to be located in user code and visible there.

Personally I think this is a false positive. I have reported it as CPP-39087 — it will be interesting to see if they agree.

kullrich commented 2 weeks ago

Hi @jkbonfield,

coming from Europe I stumbled about MISRA and this regulation in the context of medical devices, see here for the regulation:

https://health.ec.europa.eu/medical-devices-new-regulations/getting-ready-new-regulations_en

Which defines software under specific circumstances on its own as a medical device.

Anyhow, as mentioned before, I am sorry to have opened this discussion about MISRA compliance.

Best regards

jkbonfield commented 2 weeks ago

No need to apologise. It's good to broaden my horizon, even though in this case it's something I wouldn't want to touch.

It does rather sound similar to old rules NASA had for a while (maybe still does?) on writing robust code for embedded devices, but it's outside of our remit.

Some are indeed just never going to fly: "Dynamic memory allocation shall not be used". Restrictions like this just trade one problem for another problem. It guarantees you can't run out of memory, but equally so it also guarantees you have arbitrary maximum sizes for various inputs. That may be applicable where you have 100% control over input data, but bioinformatics has zero control over such things.

jmarshall commented 5 days ago

JetBrains have updated the upstream bug report to say that the issue has been fixed, with the fix to first appear in an upcoming build of CLion 2024.2 EAP.