Closed corneliusroemer closed 2 months ago
Given the timing, release of #1454 yesterday was my initial suspicion. That PR moved this code around and I thought maybe the conditions got messed up in the change to augur/filter/_run.py
.
A deeper inspection shows that it's unrelated and I believe the timing is just coincidence. The error is an approximation issue that can be fixed by addressing #1588:
400 / 406
# 0.9852 <- "exact" target_group_size is less than 1 which will pass the assertion
augur.filter.subsample._calculate_fractional_sequences_per_group(400, [1,]*406)
# 1.0254 <- "approximated" target_group_size is greater than 1 which fails the assertion
Interesting, thanks for the quick investigation!
Yesterday's scheduled run was successful. I downloaded the relevant log file for the failing run and the flanking succeeding runs
nextstrain build --aws-batch --attach <batch job id> --download 'logs/subsample_global_2m_north_america_recent.txt' ~/tmp
and confirmed that this is due to approximation issue:
✅ 2024-08-20:
WARNING: Asked to provide at most 400 sequences, but there are 412 groups.
Sampling probabilistically at 0.9522 sequences per group, meaning it is possible to have more than the requested maximum of 400 sequences after filtering.
❌ 2024-08-23:
WARNING: Asked to provide at most 400 sequences, but there are 406 groups.
Sampling probabilistically at 1.0254 sequences per group, meaning it is possible to have more than the requested maximum of 400 sequences after filtering.
✅ 2024-08-25:
WARNING: Asked to provide at most 400 sequences, but there are 432 groups.
Sampling probabilistically at 0.9033 sequences per group, meaning it is possible to have more than the requested maximum of 400 sequences after filtering.
ncov errors last night. It seems to be related to filter/subsample?
Here's the log: https://github.com/nextstrain/ncov/actions/runs/10521511438/job/29152346864#step:5:1