openpreserve / jpylyzer

JP2 (JPEG 2000 Part 1) validator and properties extractor. Jpylyzer was specifically created to check that a JP2 file really conforms to the format's specifications. Additionally jpylyzer is able to extract technical characteristics.
http://jpylyzer.openpreservation.org/
Other
69 stars 28 forks source link

Running jpylyzer on streams and/or via Jython #95

Closed anjackson closed 2 years ago

anjackson commented 7 years ago

Dev Effort

TBC

Description

I'm looking at doing something like this on the JP2's in our store. They are behind a HTTP interface, and distributing binaries across our (out of date) cluster is bit of a pain, so I was wondering about compiling it under Jython.

However, jpylyzer depends on using mmap memory-mapped files, which is not supported by Jython. But really this just raises the question of whether it makes sense to operate on streams at all? i.e. if I should cache files locally for analysis anyway, then I may as well use mmap and distribute a pre-compiled binary with the computation (like the example linked to above).

anjackson commented 7 years ago

BTW, I ended up copying jpylyzer and running it as a Hadoop streaming job, which works well. To make this simpler, I ended up patching the code so it could run on an in-memory file rather than forcing me to download it:

https://github.com/anjackson/blitter/blob/master/streaming/jpylyzer/jpylyzer.py#L290-L318

Are pull requests along these lines acceptable?

Also, are you planning to support distribution via pip and https://pypi.python.org/pypi ? It would be handy to be able to pull jpylyzer in as a simple dependency when creating derivative projects.

bitsgalore commented 7 years ago

Hi Andy,

Sorry for not getting back earlier. I'll need to give this an in-depth look, but that might take a while as I don't have the time for that any time soon. (Also I'm super paranoid of anything that changes Unicode behavior so I really want to test that on different platforms first).

One thing that caught my attention in your pull request is that you moved the mmpap import statement inside the fileToMemoryMam function. I'm not in favor of that, but that could be changed easily.

As for using pip/pyPi: yes that crossed my mind as well. Related to that the current debian package setup needs an overhaul as well as all modern Linux distros have Python 2.7 so using bytecode to create executables seems overkill.

anjackson commented 7 years ago

Thanks Johan, no worries. None of this is urgent, nor set in stone.

bitsgalore commented 5 years ago

Hi @anjackson, is this still something you'd like me to have a look at? At the time you submitted this pull request, but as it resulted in CI errors and you indicated yourself that it was untested I never got round to having a proper look at it. I'd be happy to have another look if you still need this; if not I'll close the issue. (The reason I'm bringing this up at this moment is that we just did a triage session with OPF for all open jpylyzer issues.)

bitsgalore commented 4 years ago

See this pull request by @tledoux https://github.com/openpreserve/jpylyzer/pull/154

To be included in jpylyzer 2.1 (but possibly not as CL option, but by checking for availability of mmap, and not using it if it is not there).

bitsgalore commented 2 years ago

Done: https://github.com/openpreserve/jpylyzer/pull/154