jupyter-server / jupyter_server

The backend—i.e. core services, APIs, and REST endpoints—to Jupyter web applications.
https://jupyter-server.readthedocs.io
BSD 3-Clause "New" or "Revised" License
489 stars 308 forks source link

Change md5 to hash and hash_algorithm, fix incompatibility #1367

Closed Wh1isper closed 12 months ago

Wh1isper commented 12 months ago

Resolves #1366 https://github.com/mwouts/jupytext/issues/1165 https://github.com/ploomber/ploomber/issues/1144

Sorry for the breaking changes to the API. This should allow downstream optional support for md5.

Wh1isper commented 12 months ago

@LoicGrobol @fcollonval @marr75

Please review if this fixes most of the incompatibilities.

blink1073 commented 12 months ago

In hindsight based on @bollwyvl's comment, I agree we should move to sha256, which is FIPS-compliant.

Wh1isper commented 12 months ago

@blink1073 Okay, I think I can change all the md5 to sha256. Do you want me to open another pr?

blink1073 commented 12 months ago

We can re-use this one.

blink1073 commented 12 months ago

Thanks! I think we should add jupytext as one of our downstream tests. Would you like to try that? If not I can do it in a follow up PR.

Wh1isper commented 12 months ago

@blink1073

I think we should add jupytext as one of our downstream tests. Would you like to try that?

Yeah, I'll give it a shot.

Wh1isper commented 12 months ago

Downstream tests for jupytext passed: https://github.com/jupyter-server/jupyter_server/actions/runs/6965772158/job/18954954991?pr=1367

fcollonval commented 12 months ago

Thanks @Wh1isper to work quickly on a fix an @blink1073 to review the code.

I started something yesterday but did not get the chance to finish. Basically I was proposing to return two attributes hash and hash_algorithm to have something generic as security wise what is now secure may not be in the future.

My code was ready I was only looking at how to test this without explicitly adding a dependency like jupytext that may evolve in the future.

If you give 3h (it's still very early for me 😅) I can contribute to this one.

Wh1isper commented 12 months ago

@fcollonval 😄 Good morning, I've just finished lunch being in GMT+8.

Returning hash and hash_algorithm is a good idea, it requires us to have a HashManager and then choose a default hash_algorithm. I wonder if it would be possible to allow the client to choose the hash_algorithm, which might be useful (or maybe not).

Wh1isper commented 12 months ago

I'd like to try to add hash and hash_algorithm, please review this PR later.

Wh1isper commented 12 months ago

Ready for review.

fcollonval commented 12 months ago

@Wh1isper I pushed in https://github.com/fcollonval/jupyter_server/pull/1 a bunch of improvements. Feel free to cherry-pick them.

fcollonval commented 12 months ago

I would also strongly suggest removing the downstream test against Jupytext because it is not an official Jupyter subproject and we don't have a say on it.

To check for no regression, my PR contains an appropriate ContentsManager mock up.

blink1073 commented 12 months ago

I think it is fair to test against a handful of popular community projects. I've seen that pattern used by other projects like mypy and ruff.

davidbrochart commented 12 months ago

The file hash is actually something we could take advantage of in collaborative editing. We currently compare timestamps to check if the server was the one who saved the file last, and if not then we get the new content from disk. Having a file hash would be more reliable. Thanks for the idea and for you work @Wh1isper!

Wh1isper commented 12 months ago

@fcollonval Thank you. I've learned a lot from that!

To pass the typing test, I added a number of # type: ignore.

Wh1isper commented 12 months ago

😀 I'd like to ask you to review again.

Thank you all!

blink1073 commented 12 months ago

Thanks again @Wh1isper! I'll make a new release early next week.

Wh1isper commented 12 months ago

My pleasure. Thank you all :D