pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.43k stars 17.85k forks source link

BUILD: C code coverage #58164

Open tdhock opened 6 months ago

tdhock commented 6 months ago

Hi, I am doing mutation testing of pandas with @agroce

I see five C source code files that are maintained by pandas, is that right?

_libs/src/datetime/date_conversions.c
_libs/src/datetime/pd_datetime.c
_libs/src/parser/tokenizer.c
_libs/src/parser/pd_parser.c
_libs/src/parser/io.c

I expected that pandas devs should be interested in the code coverage of these files, but on current coverage reports https://app.codecov.io/gh/pandas-dev/pandas/tree/main/pandas?search=&displayType=list there is no coverage computed for C files (only for py files).

How can the build/test scripts be modified so that C code coverage could be computed?

Or is there a reason why code coverage for C files is not interesting?

I searched the issue tracker for similar keywords, and the only related issue I saw is about coverage for Cython: https://github.com/pandas-dev/pandas/pull/54453

Thanks in advance.

jbrockmendel commented 6 months ago

Or is there a reason why code coverage for C files is not interesting?

No there is not.

Related for cython #25600

agroce commented 6 months ago

Our data also suggests (though we're not sure) the code may be considerably less covered than much of the rest, perhaps due to the lack of visibility in part. That's because the mutation score here is a lot lower (~50% vs. 70%+) than for the rest of the code.