Open AbdealiLoKo opened 8 years ago
Another possibility mentioned by @drtrigon is that we could have multiple packages. For example, file-metadata-audio
, file-metadata-video
, etc.
There are 2 methods I can see of doing this:
file_metadata
) for all of them as it'd cause an import clash. Good point. We anyway may want to split the package a bit. However we have to be aware the fact that some of the libraries might sneak into other parts later as well (e.g. scikit-image).
Another 3rd possibility is the same as for the "Plugin mechanism" but without the plugin mechanism just divide the code up into separate packages. file-metadata will use all (actually be a kind of container or meta-package for them) but other projects will may be just use some.
There are two problems here:
The latter hurt much more, and are easier to fix. If we fix the former, and video2commons rewrites their backend to use file-metadata, then we can optimise the deps specifically for their needs, and then decide how to package/release/maintain it to include only those deps. It would be great to know what features it needs.
Improving the install time by splitting the dependencies doesnt help the gsoc project very much, as it needs all deps to work properly, and could make poor quality edits if it tries to work without some libraries by downgrading the analysis. That must be avoided at all cost. Optional dependencies is the most beneficial approach, as it allows catimages to run sooner, and provide runtime warnings/errors to users which can be more helpful than errors during setup.py/pip.
An easier way to reduce this installation pain is to improve upstream with wheels and debs, etc. That does also provide speed up to the catimages users installation also.
Improving the latter (runtime startup) does help the gsoc project, but only slightly as typically catimages runs for a long time, and in batch or continuous mode looking at recentchanges, so a minute or two of startup isnt very problematic. However if deps are only loaded when needed, some deps wont be needed for certain batches. E.g. processing only one file type.
It would be great to know what features it needs.
Stream analysis. More specifically FFProbeMixin
If splitting / optional dependencies can't be done I'll just fork that part off or wait till after gsoc :)
Some parts I don't get while reading that FFProbeMixin:
executable = executable or which('avprobe') or which('ffprobe') or download_ffmpeg()
If I remember right we need a recent ffmpeg cause old ones miss json output format, @AbdealiJK can you falsify my statement?
Am 21. Juni 2016 15:22:35 MESZ, schrieb zhuyifei1999 notifications@github.com:
Some parts I don't get while reading that FFProbeMixin:
- why does it seem to prioritize downloaded ffmpeg over system-provided ffmpeg? It's almost like preferring to run arbitrary code from elsewhere
- for variable executable after which() calls, why is it not using absolute path? eg. it could be
executable = executable or which('avprobe') or which('ffprobe') or download_ffmpeg()
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/AbdealiJK/file-metadata/issues/44#issuecomment-227437588
Dr. Trigon
Improving the install time by splitting the dependencies doesnt help the gsoc project very much, as it needs all deps to work properly, and could make poor quality edits if it tries to work without some libraries by downgrading the analysis. That must be avoided at all cost. Optional dependencies is the most beneficial approach, as it allows catimages to run sooner, and provide runtime warnings/errors to users which can be more helpful than errors during setup.py/pip.
Yes, exactly!
Improving the latter (runtime startup) does help the gsoc project, but only slightly as typically catimages runs for a long time, and in batch or continuous mode looking at recentchanges, so a minute or two of startup isnt very problematic. However if deps are only loaded when needed, some deps wont be needed for certain batches. E.g. processing only one file type.
What about having a worker deamon running to avoid start up, you would just have to connect? Would that be possible?
Dr. Trigon
A worker daemon is possible, but I dont see the benefit over delayed imports of dependencies, wrt to startup time. It is better that file-metadata does its task in one thread, and the calling program put it into a thread (or many) if it is appropriate for that program. Anyway, if startup time is bad, we need to first see what is the cause with real timing data.
(note I am not requesting any timing data atm; MVP is the priority atm)
I wanted to mention a daemon for video2commons not file-metadata of course... ;)
Oh video2commons indeed use worker daemons. Flask on the frontend, celery on the backend. The backend barely suffers any start up delays due to having good redundancy and no NFS dependency, but the frontend is already way too slow, and any code update require a restart (unless in debug mode, which kind of allows arbitrary code execution).
And the frontend loads backend code (and vise versa) because, well, it's celery and they are in the same package.
@AbdealiJK: Do we currently profile the code (e.g. list each function calls timing)? Is this included in unittest/coverage? (on travis?)
Am 21. Juni 2016 17:54:39 MESZ, schrieb John Vandenberg notifications@github.com:
A worker daemon is possible, but I dont see the benefit over delayed imports of dependencies, wrt to startup time. It is better that file-metadata does its task in one thread, and the calling program put it into a thread (or many) if it is appropriate for that program. Anyway, if startup time is bad, we need to first see what is the cause with real timing data.
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/AbdealiJK/file-metadata/issues/44#issuecomment-227485154
Dr. Trigon
No, we do not profile the code right now
This was raised by @zhuyifei1999 on IRC.
file-metadata is very heavy right now. If someone just wants to use video/audio analysis, they need to install scikit-image, and a whole lot of other things that are not necessary. I think the best bet would be to make use of
extras_require
in setuptools which allows us to add additional package dependencies likefile-metadata[audio]
.More suggestions are welcome :)