sifive / wake

The SiFive wake build tool
Other
86 stars 28 forks source link

Generalize the edited-file check across separate Jobs #1594

Open ag-eitilt opened 2 months ago

ag-eitilt commented 2 months ago

Split from #1589: If a user edits a file output by Wake, from outside of Wake, before rerunning the same Job which originally generated it, Wake detects the difference in content hash and panics without overwriting the changes:

$ wake -x 'makePlan "test" Nil "echo -n test > test.txt" | runJob' ; echo edit >> test.txt ; wake -x 'makePlan "test" Nil "echo -n test > test.txt" | runJob'
Job 9428
PANIC: The hashcode of output file 'test.txt' has changed from 928b20366943e2afd11ebc0eae2e53a93bf177a4fcf35bcc64d503704e65e202 (when wake last ran) to d319334a830d32f8ab3b4c9d641360c834f712a36bfa6909277217569ea4ca33 (when inspected this time). Presumably it was hand edited. Please move this file out of the way. Aborting the build to prevent loss of your data.
$ grep edit test.txt
testedit

However, if a different second Job is used which would also overwrite that file, the hash check is bypassed and any edits lost:

$ rm -f test.txt
$ wake -x 'makePlan "test" Nil "echo -n test > test.txt" | runJob' ; echo edit >> test.txt ; wake -x 'makePlan "test" Nil "echo -n test with different contents > test.txt" | runJob'
Job 9515
Job 9528
$ grep edit test.txt
$ echo $?
1

Our determination from the meeting is that this is a known limitation as the check deliberately only iterates over the files from the last files of the Job rather than the much larger space of every file in the database, thus letting it be O(n) rather than O(n^2) -- or more accurately O(nm) where m>n.

It would still definitely be desirable to run the hash check, but to our knowledge the combined table of files maintained by the database doesn't contain sufficient information to run the check, and we definitely wouldn't want to iterate through every Job to get every one of their files. Marked backlog until someone gets a chance to look into the database specifically for a better algorithm over the current schema, or we have a database-breaking change otherwise queued up which adding the information required for a better algorithm can ride in on the coattails of.