Open GoogleCodeExporter opened 8 years ago
I don't have any specific performance metrics, but 7 seconds does seem like a
long time. The speed of FITS can vary based types of files that are being
analyzed and which tools are being invoked. If you know certain tools are not
useful for the files that you are processing you can disable them, either
through the API or by commenting them out of the fits.xml configuration file.
In my experience Jhove and the NLNZ Metadata Extractor usually take the longest
amount of time and read in much more data from the file than the other tools.
Original comment by spencer_...@harvard.edu
on 5 Apr 2011 at 2:47
I am also interested in this issue.
I haven't looked a lot through the code, so I am sorry if my questions seem
obvious to you but could you please elaborate more on the following:
As far as I know Jhove needs a lot of time for its configuration but as soon it
is done the process of extracting data from different files does not take so
long. So may be if there was a way to invoke fits on a set of files instead of
single file it will increase performance.
Thanks in advance!
P.
Original comment by PePet...@gmail.com
on 5 Apr 2011 at 4:14
That's a good point. If you are using the FITS command line to process files
one at a time, it's going to result in each tool being initialized for ever
file that is processed. If you're using the Java API the tools get initialized
once and then you can pass it as many files as you want processed.
A feature to call fits against a directory of files, without the
re-initialization happening would be a good enhancement.
Original comment by spencer_...@harvard.edu
on 5 Apr 2011 at 5:27
Hello from the archivematica project.
We've noticed this potential improvement as well.
Something I'd like to point out, is there are some similarities with clamscan.
Each time it's initialized, it load's its rule set. Scanning individual files
takes a long time with clamscan. Their solution, which I like, was to create a
daemon (clamdscan) that acted as a local server to scan the files, and holds
the rules in memory. The command line call to clamdscan uses the same
parameters as clamscan, and sends the request to the daemon. This way, the user
see's very little change in their implementation.
Original comment by josephPe...@gmail.com
on 21 Jul 2011 at 8:26
FITS now includes a -r option to recursively process directories of files.
Each tool is also invoked in a separate thread which has improved performance.
Original comment by spencer_...@harvard.edu
on 25 Apr 2012 at 1:15
Original issue reported on code.google.com by
david.wa...@gmail.com
on 4 Apr 2011 at 9:03