loris-imageserver / loris

Loris IIIF Image Server
Other
208 stars 87 forks source link

ImageInfo architecture #356

Open bcail opened 6 years ago

bcail commented 6 years ago

Currently, the ImageInfo class serves 2 purposes:

  1. It's the IIIF info that's returned to the user. https://github.com/loris-imageserver/loris/blob/development/loris/webapp.py#L511
  2. It's information about an image that Loris uses internally. https://github.com/loris-imageserver/loris/blob/development/loris/webapp.py#L637 (src_img_fp, src_format).

Should these purposes be split out into different classes, or separated somehow? Am I missing something?

Thoughts?

bcail commented 6 years ago

I'm currently seeing errors in our Loris instance where the src_format is '', and that's causing a KeyError here: https://github.com/loris-imageserver/loris/blob/development/loris/webapp.py#L688.

bcail commented 6 years ago

Looks like ImgInfo.to_dict() gives you the IIIF info. ImgInfo.to_json() dumps that to_dict data as json if cache param is False, and adds in extra Loris-specific specific information if cache param is True. That's how ImgInfo handles its dual purposes.

I think I'll try refactoring it some, to make things clearer and hopefully fix the issue I'm seeing.

alexwlchan commented 6 years ago

Should these purposes be split out into different classes, or separated somehow?

If I had the time, I’d split it into two classes – one for internal logic, one for display logic. The latter moves into webapp.py, the former remains in img_info.py. No code is shared between them, just data.

The internal class is what’s left of the old class, and splitting out the display logic makes that class very easy to test – because it only ever has to pass around data in-memory, rather than also try to read stuff from files on disk. Data-only functions are usually easier to test and fuzz.

jpstroop commented 6 years ago

Have a look at this:

https://github.com/jpstroop/loris-redux/blob/master/loris/info/jp2_extractor.py

bcail commented 6 years ago

Thanks for your comments, @alexwlchan and @jpstroop.

I think the current issue I have is with the cache. src_format is now in the ImageInfo object, which means it could come from the ImageInfo cache. However, I have old objects that were already cached without the src_format information, and I think that's why I'm getting some KeyErrors on src_format==''.

One solution is to require everyone to purge their ImageInfo cache when they upgrade. Or we could add something to the code to try to force-override objects in the cache with the new information, although that would complicate things in the code. Or we could look at separating src_format from the ImageInfo that would come back from the cache.

Thoughts on any of this? Is it an option to have everyone purge their ImageInfo cache when they upgrade, or would another solution be better? @jpstroop @alexwlchan @giorgiosironi @azaroth42

bcail commented 6 years ago

I made a change (in the img_info branch), so _get_info checks for whether the Info data from the cache has src_format or not - if not, it gets the src_format and sets the cache as if nothing was in the cache at all. Seems to be working for me so far.

giorgiosironi commented 6 years ago

It could be difficult for users to coordinate cache purging with a deploy of the new minor version. I would favor a layer of backward compatibility in the code (to be removed in a future version). Could probably wrap the lines in https://github.com/loris-imageserver/loris/compare/img_info#diff-7b8af9e99ca9db588d607bd09f065ef7R522 with come ----begin---- ----end---- comments specifying an estimate of when they could be removed.

bcail commented 6 years ago

Thanks, @giorgiosironi. I could see the next version of Loris being a major version, with the new Auth code. If we release a version 3 of Loris with the new auth code, then maybe we should remove this compatibility layer in version 4?

Also, it'd be great to get a release with Python 3 support (#324). I guess if we get that in soon, we could make a 3.0 release with the new auth and py3 support, or if it takes a while, we could do a release and then another (4.0?) release with the py3 support.

Thoughts, anyone?

giorgiosironi commented 6 years ago

On removing compatibility in 4.x: makes sense to me, shouldn't expect compatibility of a cache between distant major versions. On Python 3: if a 3.x version supports both Python 2 and 3, personally I'll use it to upgrade the Python version of the servers running it.