Open omnibs opened 3 years ago
Shake is meant to detect that the database is corrupted, wipe it, and rebuild from scratch automatically. Do you ever see that behaviour? Can you give an example of the kind of thing you are writing to the cache? e.g. is it a record? How did you obtain a Binary instance for it? Are you removing/adding fields?
Do you ever see that behaviour?
I don't think we ever saw it. Is it a new-ish feature? We're pinned to this revision https://github.com/ndmitchell/shake/tree/6297471621582d87d1e983dd9e04ed02c62beb8f
Does shake output anything different when it detects the database is corrupted and decides to wipe it?
Can you give an example of the kind of thing you are writing to the cache? e.g. is it a record? How did you obtain a Binary instance for it? Are you removing/adding fields?
I'm having a hard time reproing this again, but I think it was related to one of our Oracles. We have a few of them returning everything from records to union types, and they have Binary
instances from instance Binary [type]
, like this one:
data TestFileFormatted = TestFileFormatted FormatterType FilePath
deriving (Show, Eq, Generic)
instance Hashable TestFileFormatted
instance Binary TestFileFormatted
instance NFData TestFileFormatted
type instance RuleResult TestFileFormatted = IsFormatted
testFileFormattedOracle :: TestFileFormatted -> Action IsFormatted
testFileFormattedOracle (TestFileFormatted formatterType file) = do
-- Take a dependency on the version of the formatter, so we rerun this rule
-- when it changes.
void $ askOracle (FormatterVersion formatterType)
need [file]
let formatter = formatterFor formatterType
check formatter file
data IsFormatted
= Formatted
| Unformatted String
deriving (Eq, Generic, Show)
instance Hashable IsFormatted
instance Binary IsFormatted
instance NFData IsFormatted
When I run into this again I'll try to collect more info and share here. Any pointers on what data I should collect are appreciated!
I've run into another incarnation of this problem, where Shake doesn't detect the database is corrupted and we get runtime errors decoding binary instances. I'm not sure how related the two are, but this is he stack trace of the 2nd kind of error I get:
at apply1, called at src/Development/Shake/Internal/Rules/Oracle.hs:159:32 in shake-0.19-Kan5k6lRCGEH1JkmyaVtkS:Development.Shake.Internal.Rules.Oracle
* Depends on: OracleQ (AllServices ())
at error, called at libraries/binary/src/Data/Binary/Get.hs:351:5 in binary-0.8.6.0:Data.Binary.Get
* Raised the exception:
Data.Binary.Get.runGet at position 2316: not enough bytes
This is the relevant code:
rules :: Rules ()
rules =
versioned 6 $ void $ addOracleCache allServicesOracle
newtype AllServices = AllServices ()
deriving (Show, Typeable, Eq, Hashable, Binary, NFData)
type instance RuleResult AllServices = [Service]
allServicesOracle :: AllServices -> Action [Service]
data Service
= Service
{ serviceName :: String,
builders :: Builders,
codedeploy :: [Codedeploy],
kubernetes :: Maybe Kubernetes,
localPort :: Maybe Dhall.Natural
}
deriving (Generic, Show, Typeable, Eq)
instance Binary Service
instance Hashable Service
instance NFData Service
instance Dhall.Interpret Service
data Builders
= Builders
{ elm :: Maybe Elm,
haskell :: Maybe Haskell,
ruby :: Bool
}
deriving (Generic, Show, Typeable, Eq)
instance Binary Builders
instance Hashable Builders
instance NFData Builders
instance Dhall.Interpret Builders
This was the relevant change:
data Builders
= Builders
{ elm :: Maybe Elm,
- haskell :: Maybe Haskell
+ haskell :: Maybe Haskell,
+ ruby :: Bool
}
deriving (Generic, Show, Typeable, Eq)
We always thought this was expected behavior and instructed folks to bump the versioned
on the rule
whenever this happens.
I have been scratching my head over this issue for the last two days now:
Error when reading Shake database _build/.shake.database
Reading from ByteString, insufficient left
CallStack (from HasCallStack):
error, called at src/General/Binary.hs:50:86 in shake-0.18.5-5KeTQg6YWjFdK6sR2J2BI:General.Binary
All files will be rebuilt
Here's what I am doing:
I have a copy of the shake database that causes this error, if that helps.
I can reproduce this error predictably now:
Is there any way to get more debugging information? Which field within the DB is corrupted, exactly? Is there anything being stored in the DB that is related to the underlying OS/machine (that could cause a corruption by simply stopping/starting the underlying machine)?
I tried running my Shakefile using the --trace
option and it seems to be reading the shake DB in some sort of "chunks". Is there any way to print which chunk is causing this error?
Can reading env variables using unsafePerformIO
have anything to do with this? For example:
envBranch :: String
envBranch = unsafePerformIO $ Env.getEnv "CI_COMMIT_REF_NAME"
Although, I'm not sure if this is landing up in the shake DB, or not.
Update: It seems that the underlying disk of the VM was actually getting corrupted, because I was using an unsafe method to create a VM snapshot. After fixing the snapshotting process (i.e. shut down the VM before taking a snapshot) I don't think I have encountered this error with Shake.
We run into this error with some frequency, and so far believed it to be:
1) write thing to the cache (like an oracle) 2) change thing's format (like the oracle's data type) 3) run shake again and the mismatch throws this error
But we ran into one scenario today where we gave up. We added
versioned
to just about everything we could think of, and ended up nuking the cache.I'm not sure if this is the steps I outlined are indeed when this happens, or if they are some of the possible scenarios, just not all, but we could use some help from Shake itself figuring out what to do when we hit this error.