jackwasey / icd

Fast ICD-10 and ICD-9 comorbidities, decoding and validation in R. NB use main instead of master for default branch.
https://jackwasey.github.io/icd/
GNU General Public License v3.0
240 stars 60 forks source link

undefined behavior found by CRAN memtests, not replicable #88

Closed jackwasey closed 8 years ago

jackwasey commented 8 years ago

I can't replicate this with the following environment. There are some possible llvm/libc++ bugs relating to this, which may have now been fixed, or the bug was from a dependency, e.g. Catch in testthat, or Rcpp. Either way, currently nothing to fix, since I can't replicate with most recent versions of R and clang.

CC = clang -fsanitize=address,undefined -fno-sanitize=float-divide-by-zero,vptr,function -fno-omit-frame-pointer
CFLAGS = -pipe -Wall -pedantic -g -O2 -mtune=native $(LTO)
CPICFLAGS = -fpic
CPPFLAGS = -I/usr/local/include
CXX = clang++ -fsanitize=address,undefined -fno-sanitize=float-divide-by-zero,vptr,function -stdlib=libc++ -fno-omit-frame-pointer -I/usr/local/include/c++/v1 -I/usr/local/include
CXXCPP = $(CXX) -E
CXXFLAGS = -pipe -Wall -pedantic -g -O2 -mtune=native $(LTO)
CXXPICFLAGS = -fpic
CXX1X = clang++ -fsanitize=address,undefined -fno-sanitize=float-divide-by-zero,vptr,function -stdlib=libc++ -fno-omit-frame-pointer -I/usr/local/include/c++/v1 -I/usr/local/include
CXX1XFLAGS = -pipe -Wall -pedantic -g -O2 -mtune=native
CXX1XPICFLAGS = -fpic
CXX1XSTD =  -std=c++11

R version 3.3.0 RC (2016-04-25 r70549) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

clang version 3.9.0 (trunk 267734)
Target: x86_64-unknown-linux-gnu
Thread model: posix

Original report from CRAN below, in case it disappears:


R Under development (unstable) (2016-04-21 r70529) -- "Unsuffered Consequences"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> # Copyright (C) 2014 - 2016  Jack O. Wasey
> #
> # This file is part of icd.
> #
> # icd is free software: you can redistribute it and/or modify
> # it under the terms of the GNU General Public License as published by
> # the Free Software Foundation, either version 3 of the License, or
> # (at your option) any later version.
> #
> # icd is distributed in the hope that it will be useful,
> # but WITHOUT ANY WARRANTY; without even the implied warranty of
> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> # GNU General Public License for more details.
> #
> # You should have received a copy of the GNU General Public License
> # along with icd. If not, see <http:#www.gnu.org/licenses/>.
> 
> # this is common code to all the tests, each of which runs test_check with a different filter:
> 
> library("icd")
Welcome to the "icd" package for finding comorbidities and interpretation of ICD-9 and ICD-10 codes. Suggestions and contributions are welcome at https://github.com/jackwasey/icd .

Available options are:
options("icd.warn_deprecated" = TRUE) which will warn if deprecated functions from the old "icd9" package are used.
See the vignettes for examples.

Please cite this package if you find it useful in your published work.
citation(package = "icd")

> library("testthat", warn.conflicts = FALSE, quietly = TRUE)
> library("magrittr", warn.conflicts = FALSE, quietly = TRUE)
> 
> # we now rely on a testthat version with backwards-incompatible changes, the whole of the expectation setup has changed,
> # and there are many deprecations. For now, only run tests if testthat version is high enough:
> #
> if (packageVersion("testthat") < package_version("0.11.0.9000")) {
+   message("testthat version is less than 0.11.0.9000, so not running Catch tests. Consider using:
+           devtools::install_github('hadley/testthat')")
+ }
> 
> # when covr runs tests, it installs the package with source, and sources the files in the test directory.
> # This means that /data-raw would be absent, so it should be in inst/data-raw for testing.
> 
> icd:::setup_test_check()
> icd:::show_test_options()
$icd.do_slow_tests
NULL

$icd.warn_deprecated
NULL

> # http://stackoverflow.com/questions/406230/regular-expression-to-match-line-that-doesnt-contain-a-word
> icd:::my_test_check("current-((?!comorbid).)*$", "Running current tests without comorbid")
Running current tests without comorbid
Will skip slow tests
attributes: ..............
billable code lists: ..S.............
generate defined child codes for ICD-10-CM: .........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
S3 class functions: ..................SSS..........SS...............................................................
class conflicts: ...................SSS
class updates: SS....
condense ranges of codes to parents and orphans: /usr/local/clang-trunk/bin/../include/c++/v1/__tree:966:16: runtime error: downcast of address 0x7ffd1a1c7b78 with insufficient space for an object of type 'std::__1::__tree_node<std::__1::basic_string<char>, void *>'
0x7ffd1a1c7b78: note: pointer points here
 fd 7f 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  c0 7d 1c 1a fd 7f 00 00  69 00 00 00
              ^ 
SUMMARY: AddressSanitizer: undefined-behavior /usr/local/clang-trunk/bin/../include/c++/v1/__tree:966:16 in 
./usr/local/clang-trunk/bin/../include/c++/v1/__tree:959:16: runtime error: downcast of address 0x7ffd1a1d3478 with insufficient space for an object of type 'std::__1::__tree_node<std::__1::basic_string<char>, void *>'
0x7ffd1a1d3478: note: pointer points here
 60 60 00 00  40 b5 0f 00 60 60 00 00  0b 00 00 00 00 00 00 00  c0 36 1d 1a fd 7f 00 00  50 00 00 00
              ^ 
SUMMARY: AddressSanitizer: undefined-behavior /usr/local/clang-trunk/bin/../include/c++/v1/__tree:959:16 in 
.........................................................................................
icd9 type conversions: .........................................................................................................................................................OMP: Warning #96: Cannot form a team with 24 threads, using 2 instead.
OMP: Hint: Consider unsetting KMP_ALL_THREADS and OMP_THREAD_LIMIT (if either is set).
....
icd10 conversions: .....
C++: .
function examples: 
explain ICD-9: code to human-readable: ....................................................................................S.....................S
filtering on ICD-10 validity: ...
filtering on POA: ...............................
guesss ICD type and version: ..............................
icd9cm_hierarchy was parsed as expected: .........................
sub-types of ICD-9 code: ............................................
compare ordered long to wide methods: ......
basic ICD-9 manipulations: ................................................................................................................................................
OpenMP thread and chunk parameters: ............................................................................................................................................................................................................................................................
icd10 WHO parse: S
icd10 fixed width parse: S
icd10 XML parse: S....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
RTF ICD-9: ....................................................................................................................................................................................................................................................................................................................................................................................................................
RTF parsing: ............
RTF tests, sources optionally downloaded when required: 
ICD-10 ranges: ...........
ICD-10 ranges - defined children: .......................
icd9 ranges: ....................................................................................................................................................................................
reshaping wide to long: ............
SSS....SAS format file data interpretation: .................
Charlson and comorbidity counting: ...........................................
dictionary compiles for spelling check: .
trim, strim, other utils: .............................................................
ICD-10 codes are valid, defined, billable: ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
icd9 validation: ........................................................................................................................................................................................................................................................................................................................................................
WHO ICD-10 codes were fetched: S

Skipped ------------------------------------------------------------------------
1. ICD-9-CM billable codes package data is recreated (@test-current-billable.R#35) - skipping test because flat file ICD-9-CM sources not available for version:  32

2. well ordered class lists are created (@test-current-class.R#32) - this is time consuming, and we should probably tolerate mixed order anyway

3. well ordered class lists short/decimal ICD combos are created (@test-current-class.R#38) - this is time consuming, and we should probably tolerate mixed order anyway

4. warn if changing ICD decimal to short or vice versa (@test-current-class.R#51) - not sure I want this behaviour

5. warn if combining mixed ICD sub-version types (@test-current-class.R#75) - hold off this for now

6. error if combining mixed ICD version types, e.g. ICD-9 vs ICD-10 (@test-current-class.R#81) - this is nice to have, but adds weight to 'c'

7. conflicting ICD type classes can be found (@test-current-class.R#242) - working on this

8. conflicting short vs decimal class asssignment (@test-current-class.R#254) - working on this

9. long vs wide data conflict identified (@test-current-class.R#260) - WIP

10. update data frame class for simple cases (@test-current-class.R#274) - obsolete

11. fail to update data frame class with conflicting cols (@test-current-class.R#292) - obsolete

12. explain icd9GetChapters bad input (@test-current-explain.R#220) - todo

13. icd9 descriptions is parsed correctly (@test-current-explain.R#267) - skipping test because flat file ICD-9-CM sources not available for version:  32

14. icd10 WHO recreated exactly (@test-current-parse-icd10.R#23) - not implemented yet

15. icd10 2016 flat file details are okay (@test-current-parse-icd10.R#30) - skipping test because flat file ICD-10-CM source not available

16. icd10 sub-chapters are recreated exactly (@test-current-parse-icd10.R#61) - skipping test because XML file ICD-10-CM source not available

17. generating uranium data is identical to saved (@test-current-sample-data.R#19) - uranium pathology data must be downloaded with fetch_uranium_pathology

18. ICD-10 codes in uranium data are okay (@test-current-sample-data.R#30) - reinstate this test once ICD-10 WHO codes are available for comparison.
       Uranium Pathology data is not ICD-10-CM, but ICD-10 WHO.

19. generating vermont data is identical to saved (@test-current-sample-data.R#43) - vermont data must be downloaded with fetch_vermont_dx

20. some tricky looking codes (@test-current-who-icd10.R#4) - don't have WHO codes to test. Run function to download them.

DONE ===========================================================================
> 
> proc.time()
   user  system elapsed 
693.455  11.572 694.360 
/data/gannet/ripley/R/test-clang/testthat/include/testthat/vendor/catch.h:3206:25: runtime error: member call on address 0x7fdc87e97628 which does not point to an object of type 'std::__1::basic_ios<char>'
0x7fdc87e97628: note: object is of type 'std::__1::ios_base'
 dc 7f 00 00  68 53 4f 94 dc 7f 00 00  02 10 00 00 00 00 00 00  06 00 00 00 00 00 00 00  00 00 00 00
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'std::__1::ios_base'
SUMMARY: AddressSanitizer: undefined-behavior /data/gannet/ripley/R/test-clang/testthat/include/testthat/vendor/catch.h:3206:25 in 
kevinushey commented 8 years ago

FWIW, I believe this error is being caused by testthat + Catch; likely it was due to the creation of the r_ostream object (used so that output would be sent back to R). I think ASAN might have been picky about me inheriting from std::streambuf vs std::basic_streambuf<char>, but still need to verify that.

jackwasey commented 8 years ago

Thanks for picking this up. Will keep a lookout in case it pops up again.