luvit / lit

Toolkit for developing, sharing, and running luvit/lua programs and libraries.
http://lit.luvit.io/
Apache License 2.0
245 stars 58 forks source link

"No such hash" lit install reports #301

Open Bilal2453 opened 2 years ago

Bilal2453 commented 2 years ago

I've been running into this error since a while now, but as a small example, trying to install the module Bilal2453/vips will result in an error similar to image.

This package was normally published through lit publish ., my ssh key and auth seems to be valid. The repo of that package is here.

I've tried to dig deeper into this without a lot of luck, since the error seems to happen on lit side, but I was able to specify which files are causing this through Lit REST API:

  1. Taking a look at https://lit.luvit.io/trees/b694fbf90abc511decc13d2f7cb2be37c160b538/Bilal2453/vips/v1.1.10/libs lists a bunch of files in the folder libs.
  2. A GET request to the file with f4a9d19dd2ff3420f24348f958926fce38ea8b56 works fine.
  3. A GET request to the file with the hash 213f714766b96c2c9121025845132442eb4c5540 triggers this error.

There are other files that trigger it or does not, not quite sure what is in common between the corrupted hashes.

Bilal2453 commented 2 years ago

Good to note, currently all of my packages have the same issue. So there is a chance it is not per module, but per user. As well as the chance of me doing something wrong, although I can't see how is that.

Bilal2453 commented 2 years ago

After some thinking and talking to contributors, this might have been caused by a bad local litdb on my end. Just when I wanted to publish one of my packages, Lit reported an IO error related to my hard-disk -It is dying- I am suspecting that what happened might have been:

  1. Bad local litdb that Lit only failed to read now.
  2. Lit server side does not do any validations of the data/hashes it is receiving.
  3. Lit managed to upload bad litdb to upstream all this time.
  4. I requested Tim to delete packages, and reuploaded, although that did not help. This might have been caused by the strong cache Lit do server side.

I am not sure if that is indeed the cause, but it is one possible scenario. Sadly, only way to test it out is by... deleting the packages and restarting the upstream server, and then trying to republish them. Sounds like too much trouble for a one user.

Bilal2453 commented 2 years ago

I've been trying on a workaround of testing the previous theory, other than restarting the whole upstream since that's pretty problematic, and I came up with the idea of forcing a rehash on all malformed files then publishing that.

I tested that method on Bilal2453/vips and it seems indeed that was the issue, now it works after the workaround.

For anyone else who is having the same issue, here is a vague idea of how I did it:

  1. Patching Lit to not sync on publish. This is important since otherwise it will error before even trying to publish –which is indeed what I ran into before opening this issue, and didn't notice that because I had no idea what was the problem. Just comment this line out and re-build Lit.
  2. Edit all malformed files. Any edit to the contents will do. I simply added an empty comment at the top of each file, to force the rehash.
  3. Delete your .litdb.git. Do a backup of it if you wish, although that shouldn't matter.
  4. Use the patched Lit to re-publish your package, make sure you do bumb your versioning up. After those steps my package seemed to behave itself.

@squeek502 Should we close this? Or perhaps suggest that Lit do account to a corrupted local litdb.git upstream and warn user about it. Another suggestion could be to add an option to force a publish without syncing, removing the need of a patch to local Lit, as an emergency way of escaping this loop.

I don't know if we should call this a bug, since it is basically Lit never checking the hashes it is receiving, so more of a missing feature perhaps?

squeek502 commented 2 years ago

Seems like some sort of validation on either the client or server's end (or both?) would be nice. It's been a while since I've looked at the Lit internals though so I don't have any suggestions for what that might look like at the moment.

Bilal2453 commented 2 years ago

Indeed, sounds like some missing validation, at the very least server side. I am honestly not quite sure where exactly either, I didn't dig to that point. But all I know is somewhere, the hash being used somehow mismatches in the git db (another theory was that the lit git implementation gets out of sync somehow).

And not being able to force a publish and/or somehow delete the package AND clear its cache server side adds to this issue, making user stuck in this loop of no hash errors.

Bilal2453 commented 2 years ago

Oops that close was accidental

Bilal2453 commented 2 years ago

Not sure what to say, but this is not as straightforward as I thought.

After publishing the problematic version, 1.1.10, I have published 1.1.11 of the mentioned package; and that version did behave just fine, until I published 1.1.12 that fixed some bugs in the previous one, both from a different new drive, and this time 1.1.12 had the same no hash problem.
This time with 3 files, and a binary. All of which have unique hashes not presented in any of the previous versions –intentionally to make sure no conflicts happen with the cache, yet it still got corrupted with different files this time.

I have as well spotted a pattern, each time I publish one of these "corrupted" modules... I get through this weird scenario after publishing the problematic version: (In following example, I published a new version 1.1.13 to fix the second faulty one 1.1.12)

  1. Publish is done with a new fresh litdb.git. Everything seems alright. (Not lit-bin used here is modified following the previously explained method) tempFileForShare_20211129-234911.jpg

  2. Verifying the version was indeed published to lit through querying: tempFileForShare_20211129-234743.jpg You notice here it is not the new one I just published.. what happened? You can verify that here too https://lit.luvit.io/packages/Bilal2453/vips/v1.1.13 . Let's ignore that and still try to install it.

  3. Trying to install the new version anyways, and we make sure that we delete .litdb.git before doing so: tempFileForShare_20211129-234743.jpg We notice here it errors with a hash from v1.1.12, so it definitely is using older version.. but then we just published the new one..

  4. Trying to re-publish (haven't done so yet) will tell you that said tag was already published and no new changes. But then querying lit, it says the latest version is indeed 1.1.13 and says that was uploaded a day ago (or whatever time it was on first publish).

It seems to always start this no hash problem after the previous pattern repeats. This has happened to me many times now and I only linked between it and between this bug now since I wasn't paying much attention to the no hash problem thinking it is a single user issue. I haven't yet done the last step I hope current state is easier to debug server side maybe.

Edit: I just did the last step of re-publishing and indeed got a new no hash problem even though I forced a rehash of almost everything Screenshot_20211130-131154_Termux.jpg