radiocosmology / alpenhorn

Alpenhorn is a service for managing an archive of scientific data.
MIT License
2 stars 1 forks source link

Can't re-check file copies marked "has_file=M" #133

Open cubranic opened 4 years ago

cubranic commented 4 years ago

I think I ran into a loophole where file copies are marked "M" and there is no way to restore them short of hand-editing the database.

This is what happened:

  1. transport disk at UBC, all files on it are good, and marked as "has_file=Y" in the database
  2. copy request from the TD to cedar
  3. on cedar, alpenhornd starts pulling the files, but I run out of quota space, and bbcp returns an error as it tries to write them.
  4. alpenhorn detects error during copying, and marks the file copy on the TD as "has_file=M".
  5. there are no ways to reset that to "Y" using client utilities, and I have to write SQL in order to do so and resume the copying after fixing my quota issues.

Commands I tried:

ketiltrout commented 4 years ago

This happens all the time with our UBC-to-cedar copying, too. At least with alpenhorn 1, the solution is to run an alpenhorn daemon on the machine at UBC with the TDs (jingle in our case). The only thing that particular daemon ever does is wait around for the cedar daemon to mark TD files as has_file='M' and then runs a verify on them to either mark them X or Y.

Other that that (i.e. the vast, vast majority of the time), it just spins there doing nothing.

That said, a manual verify should also be checking has_file="M" entries.

ketiltrout commented 4 years ago

Really, verify should check all file copies that don't have has_file='N', i.e. all supposedly extant, files whether they're ok (Y), uncertain (M) or known to be bad already (X), and then reporting differences from the database.

jrs65 commented 4 years ago

Yeah, I agree the condition should probably has_file!="N" for verify

ketiltrout commented 4 years ago

Then it's complementary to scan, which is meant to find files which aren't expected to be on that node (i.e. it only updates absent entries and entries with has_file=N).