Open bockthom opened 5 hours ago
I have already implemented a fix for that, see the linked patch: 0001-Ensure-correct-data-type-of-NA-timestamps.patch
However, we also should properly test that, to make sure that the patch properly works in various situations and does not break anything.
Description
project.data$get.data.cut.to.same.date(data.sources = c("issues", "mails", "commits"))
sometimes fails if some of the data sources are empty, ending up in the following error message:This error occurs if the issue data is the empty data source, but not if the mail data is the empty data source, which is an unexpected behavior.
After some debugging, I noticed that the error is caused by wrong data types when converting strings to POSIXct dates: Usually, if the date is not a POSIXct object, it is converted to POSIXct when splitting the data: https://github.com/se-sic/coronet/blob/24005e44aaf602789168f69dbcd5cc0a077c0194/util-split.R#L69-L70 However, this fails if the bins (i.e., the timestamps used for cutting) are not a string but a numeric (such as UNIX timestamps). Therefore, we need to look at where the timestamps for cutting come from: They are extracted in the function
extract.timestamps
: https://github.com/se-sic/coronet/blob/24005e44aaf602789168f69dbcd5cc0a077c0194/util-data.R#L724-L758If no data is available for the specified data source, the corresponding timestamps are set to
NA
. For whatever reason, this converts the data frame used for cutting into a numeric data frame of UNIX timestamps if the first row containsNA
values. Otherwise, if there are already POSIXct objects in there, the insertedNA
values are interpreted to be POSIXct objects.Suggested Fix
Instead of setting the timestamps to
NA
in case of missing data sources, set the timestamps toas.POSIXct(NA)
. This way, we ensure that the data type is POSIXct even if theNA
values are the very first ones that are inserted into the timestamps data frame.Versions
This affects, at least, coronet version 4.4 in combination with R version 4.4.1. Also earlier coronet versions or R versions may be affected by this bug.