keeps / roda-in

Tool to create Submission Information Packages (SIP)
http://rodain.roda-community.org
GNU Lesser General Public License v3.0
23 stars 11 forks source link

Metadata extraction at SIP creation time #319

Open beepsoft opened 7 years ago

beepsoft commented 7 years ago

Hi,

Would it be possible/feasible to add metadata extraction to roda-in at SIP creation time?

For users who are not archivists and know little about metadata but would like to create meaningful SIPs I would find it quite useful if roda-in itself would provide Tika, ExifTool etc. plugins (just like RODA) to extract metadata from the files to be included in the SIP. So, when someone adds files and directories to the SIP, there could be an option whether to run automatic metadata extraction for those files. If the user selects to have metadata extraction, roda-in would extract as much metadata as possible and generate the appropriate EAD 2002, DublinCore, etc. metadata description for each file where possible.

What do you think about it?

hsilva-keep commented 7 years ago

Dear @beepsoft ...Possible it would be, and we had a strategy very similar in roda-in version 1.x for file characterization, but we have ditched that type of approach because:

  1. we found out that integrating those tools was very heavy in terms of app size (from 200/300MB to 20MB in version 2.x)
  2. we have changed the app objectives: in version 2.x the app must be able to produce massive amounts of SIPs, in different package formats, with high versitility in terms of metadata schemas (not prescriptive), with a few clicks.

And this approach maries very well with RODA repository because all the other tasks will be done on the repository side, e.g. preservation metadata creation with technical metadata.

So, as far as I'me aware, we have no plans in adding that type of functionality. Nevertheless, thanks for the suggestion.

beepsoft commented 7 years ago

Dear @hsilva-keep,

thanks for the clarification! Do you have a roadmap or estimation when 2.x is to be released?

Thanks!

hsilva-keep commented 7 years ago

At the moment, and aside from final/official release, the latest release is stable and fully functional. And we don't have any urgent/needed functionality to develop in the next couple of weeks/months.

luis100 commented 7 years ago

@beepsoft the idea of extra features in RODA-in is interesting and as @hsilva-keep said it has been tried in the past but had drawbacks. To do it right, I foresee we need the following:

Having this, I imagine several types of plugins: