Implement RIF-CS export

silviapfeiffer commented 12 years ago

jangari commented 12 years ago

I can supply you with some kind of xml file showing what collection level metadata categories we'd like to go into which rif-cs xml elements, if that'll be helpful.

silviapfeiffer commented 12 years ago

Sure!

jangari commented 12 years ago

@silviapfeiffer, I may have to have a talk to you about how the feed will work. I'm not totally sure about howharvested rif-cs will function, but will it work like a single file that gets updated gradually with new collections and new changes, perhaps those up to on week old or something? Will ANDS simply update already existing records with updated metadata? If so, will values append or overwrite?

I've asked a rif-cs expert these questions as well, but I thought you might also know.

jangari commented 12 years ago

And so, do you want me to give you separate example files for collection, party/individual, party/institution, party/funding body and activity?

silviapfeiffer commented 12 years ago

It will just work like a RSS feed, I would think.

ANDS feeds are at the collection level and we have examples from Stuart's ANDS feed that we will work with. E.g. Ihttp://researchdata.ands.org.au/linguistic-field-recordings-and-photographs-1975-1995-by-linda-barwick-luna-and-burgos-ilocos-sur-1993-and-vigan-ilocos-sur-1995

If you have any changes to those fields, please share.

silviapfeiffer commented 12 years ago

So, we've been spending some time going through all the ANDS stuff and we've found that ANDS takes OAI-PMH as well as a RIF-CS feed. Is there any particular reason we need to provide RIF-CS and can't just give them our already implemented OAI-PMH feed?

silviapfeiffer commented 12 years ago

So, I think I understand this better now. OAI-PMH is just the protocol, but the format of the XML feed can be adapted. So, our existing feed is for items. I will just add another feed for collections in RIF-CS format and we should be set. :-)

jangari commented 12 years ago

The example collections should be a good enough demonstration, but with the addition of institution information (which is currently handled differently in the examples. Where we currently do (paraphrased xml; can't quite remember all the particulars right now):

<collection>
...
     <relatedObject>
       <key>http://nla.gov.au/nla.party-593909</key>
       <relation type="isOutputOf" />
     </relatedObject>
</collection>

(where that key identifies the University of Melbourne)

We would actually have:

<collection>
...
     <relatedObject>
       <key>paradisec.org.au/institutions/345</key>
       <relation type="isOutputOf" />
     </relatedObject>
</collection>
</registryObject>
<registryObject>
  <party type="group">
    <key>paradisec.org.au/institutions/345</key>
    <name>The University of Melbourne</name>
    <identifier>http://nla.gov.au/nla.party-593909</identifier>
  </party>
...
 </registryObject>
</registryObjects>

silviapfeiffer commented 12 years ago

I'm trying to test my new OAI-PMH RIF-CS data source with ANDS.

I've found that on the published site this is the data source: http://azoulay.arts.usyd.edu.au/rif-cs/paradisec_rif-cs_harvest.xml

I've found that on the demo site, this is the data source: http://dl.dropbox.com/u/767553/paradisec-rifcs.xml

I'm going to experiment on the demo site with the new feed.

silviapfeiffer commented 12 years ago

I need help to resolve why ANDS is not accepting our feed in their demo environment.

jangari commented 12 years ago

Yeah, I'm not sure why that would be. I'll look into it when I'm at uni tomorrow.

jangari commented 12 years ago

If you like, I can change the data source url to whatever you like The published site source is the one I used to import the 10 example collections that are being assessed. But it can be anywhere that's publicly accessible.

silviapfeiffer commented 12 years ago

I know and that's how I was testing it. We should even be able to use the catalog.paradisec.org.au by now because we've set it up. But that's not the problem - I managed to get my feed accepted. Yet, it eventually comes back and breaks. So, I think I need to talk to somebody at ANDS to get the report.

silviapfeiffer commented 12 years ago

Go to http://catalog.paradisec.org.au/oai/collection?verb=ListRecords&metadataPrefix=rif and let me know how you'd like to see the feed changed. :-)

LindaBarwick commented 12 years ago

wow, looks very impressive :)

Linda Barwick linda.barwick@gmail.com

Running Sydney Half Marathon on 16 September to raise funds for ovarian cancer research - please donate via http://bsrf2012.gofundraise.com.au/page/LindaBarwick1

On 19/08/2012, at 5:04 PM, Silvia Pfeiffer wrote:

Go to http://catalog.paradisec.org.au/oai/collection?verb=ListRecords&metadataPrefix=rif and let me know how you'd like to see the feed changed. :-)

— Reply to this email directly or view it on GitHub.

jangari commented 12 years ago

Feed looks pretty good Silvia, well done! I just had a talk with Xiaobin just now and a chat earlier with Simon Pockley, and I have some comments that come out of those conversations, and also my ideas for how this feed will supply metadata.

1) Relation descriptions

This is what related objects look like:

<relatedObject>
<key />
<relation type="hasCollector">
<description>Collector</description>
</relation>
</relatedObject>

Description field here really isn't needed. Relation type handles that. If anything, this should be the collector's name, just like the institution link is handled. On closer inspection of the rif-cs specs, description here is only for use with the relation type hasAssociationWith, where it specifies the association. So I think we should ditch altogether descriptions in relatedObjects.

2) Language codes

Subjects of type iso639-3 (ethnologue codes) are inserted in the following format:

<relatedInfo type="website">
<identifier type="uri">http://www.ethnologue.com/show_language.asp?code=[code]</identifier>
<title>Ethnologue entry for [language]</title>
</relatedInfo>

This is great, but it should only be inserted if there are subjects of the type iso639-3. Occasionally, as in collection AC1, there are no language codes to insert. In those situations, this whole element should be ditched.

3) User and institution keys

The keys for users and institutions don't have to be urls. I think they should be consistent with the collection keys: paradisec.org.au/user/[number] paradisec.org.au/institution/[number]

If we want, we can insert a url linking to our admin pages (catalog.paradisec.org.au/admin/users...) in the 'identifier' field of those party records, but this is above the level of what is required of us.

4) Keys (generally speaking)

Keys should all be prefixed with paradisec.org.au, rather than catalog.paradisec.org.au. The reason for this is that they aren't urls (identifiers and electronic addresses are) but are just internal unique IDs within the context of ANDS. So they should look like: Collections: paradisec.org.au/collection/[collection_ID] Collectors: paradisec.org.au/user/[number] Institutions: paradisec.org.au/institution/[number]

I've gone back and forth a few times on this myself, and it was only after a conversation this afternoon with Simon Pockley from ANDS that I have come to a decision that this is the clearest and most concise (and most aesthetic) option. Again, these don't at all have to relate to the internal urls (by the same token I'd remove the http:// prefix) in the catalog or repository; that we can handle in the identifier or electronic address fields.

5) Spatial location

This seems to be what happens if a collection has no geo data:

<spatial type="iso19139dcmiBox">
northlimit=0.0; southlimit=0.0; westlimit=0.0; eastLimit=0.0;
</spatial>

Can we fix that? Also, if we have a bounding box, NABU should also support this, presumably. Perhaps this conversation is happening already, I'm not sure.

6) Dates

I see that the dates are generated from the items, and that's great, but as with spatial coordinates and language codes, if there are no dates, the temporal coverage field should probably not be inserted.

7) Party records, activity records,

We need to supply party records as well for the users and institutions to link properly to, otherwise they won't validate properly. Each collection contains something like:

<relatedObject>
<key>http://catalog.paradisec.org.au/admin/users/62</key>
<relation type="hasCollector">
<description>Collector</description>
<url/>
</relation>
</relatedObject>

But there needs to be a record with key http://catalog.paradisec.org.au/admin/users/62 for this collection to properly validate. I'm unsure how this would work in the feed though, e.g., how would NABU know which party records to supply, which are associated with collections, etc. I can liaise with you about this, and institutions, which are going to be handled in the same way.

We also need each collection to be related to at least one activity to be maximally valid. We have the key paradisec.org.au as an activity in the published site, but not the demo site, so these would work if the feed was harvested by the published site. So I guess we need not do anything about it for now...

8) Maybe we should consider not feeding collections to ANDS until they have the minimum necessary metadata to achieve quality level 2 or higher.

This is getting exciting now.

jangari commented 12 years ago

Github is changing all my numbers to 1. !?

jangari commented 12 years ago

Fixed.

silviapfeiffer commented 12 years ago

Excellent! I will get onto these. Feel free to give me more feedback if you are missing any data that would get us to a higher quality level.

I agree that we should not feed collections to ANDS until they are good enough. However, I think we can continue to use their demo site to experiment with it, so we can all see what the result of our feed will be. I believe I can remove those that I imported earlier.

jangari commented 12 years ago

Actually, it might be a good opportunity to see what happens when you update the feed. I'm still trying to figure out whether records update or overwrite. so don't delete.

silviapfeiffer commented 12 years ago

Relation descriptions
removed all description fields
Language codes
only printing languages if they have been given
User and institution keys
fixed them to be paradisec.org.au/user/[number]http://paradisec.org.au/user/%5Bnumber%5D and paradisec.org.au/university/[number]http://paradisec.org.au/institution/%5Bnumber%5D
added proper url in the field
Keys (generally speaking)
renamed them to Collections: paradisec.org.au/collection/[collection_ID]http://paradisec.org.au/collection/%5Bcollection_ID%5D
Spatial location
John is working on fixing these
will only be printed if at least one of the coordinates is non-zero
Dates
will not be printed if empty
Party records, activity records I tried including a record. It didn't work. Any help you can give me - in particular an example file - for how to provide parties and activities would be great. Do we need a separate feed for this?

silviapfeiffer commented 12 years ago

A re-import just added the records. That was to be expected, though, because we changed the keys.

I will remove the old records.

jangari commented 12 years ago

Here's an example party record that I uploaded a couple of weeks ago:

<registryObject group="PARADISEC">
    <key>paradisec.org.au/collector/122</key>
    <originatingSource type="authoritative">http://paradisec.org.au</originatingSource>
    <party type="person">
      <name type="primary">
        <namePart type="family">Daniels</namePart>
        <namePart type="given">Don</namePart>
      </name>
      <relatedObject>
        <key>paradisec.org.au/collection/MC2</key>
        <relation type="isCollectorOf"/>
      </relatedObject>
    </party>
  </registryObject>

Plus it needs all the normal metadata header information.

One last thing; the identifier element should be filled with the same url as the electronic address (as we currently have it). We will need to think about this electronic address later when the archive is up, but for now, both should point to:

http://catalog.paradisec.org.au/collections/[collection_ID]

Identifiers are optional anyway, and the field will be used to store NLA identifiers for parties, but it's an extensible field so there's no problem putting something in. Simon advised doing it this way actually.

Thanks for the very quick implementation!

jangari commented 12 years ago

Should we be supplying email addresses for person records? Maybe for privacy we should leave it as is.

Institutions would be pretty much the same:

 <registryObject group="PARADISEC">
    <key>paradisec.org.au/institution/10</key>
    <originatingSource type="authoritative">http://paradisec.org.au</originatingSource>
    <party type="group">
      <identifier type="AU-ANL:PEAU">http://nla.gov.au/nla.party-593909</identifier>
      <name type="primary">
        <namePart>The University of Melbourne</namePart>
      </name>
      <location>
        <address>
          <electronic type="url">
            <value>http://www.unimelb.edu.au/</value>
          </electronic>
        </address>
      </location>
      <relatedObject>
        <key>paradisec.org.au/collection/SUY1</key>
        <relation type="someRelationship"/>
      </relatedObject>
    </party>
  </registryObject>

Where 'some relationship is the inverse of 'isOutputOf'. Maybe this relationship can be generated in reverse from the collection by the 'automatically generate backlinks' option.

Note the identifier field being used to link to NLA records. This is for the trove record field in NABU. lso, I got from the convo with Simon that there's another option in the harvester settings called 'push to NLA' which does exactly that, so that NLA update their feed with our records. Pretty cool, no?

jangari commented 12 years ago

As it stands, if party records can be supplied as well, all records will get quality level 2 and will be minimally compliant. To get quality level 3 and be therefore very good, they will need FoR codes, temporal coverage dates, spatial coverage, an identifier (check), and citation information (check; although I would want to have a chat about this; perhaps we can generate it dynamically from NABU's fields).

'Subject' checks out if it has anything, although they really want FoR codes, which I understand we're in the process of implementing, @LindaBarwick might be able to let us know how we're progressing in that regard.

jangari commented 12 years ago

Sorry; I keep seeing more things to comment on.

Perhaps this is not the place to put this, but within NABU, empty titles and descriptions are replaced with:

PLEASE PROVIDE TITLE

and so on. This is filtering through into the feed, so maybe there's a better way of handling it in NABU that doesn't fill the field with a placeholder. Maybe highlighting the field in red and having 'suggestion text' in the field, the sort that disappears when you click into it, and isn't part of the database.

silviapfeiffer commented 12 years ago

I have implemented the feeds for party, both for collectors and institutions, but I don't know how to supply it to them. Can you find out if we can supply this via OAI-PMH, too, or whether you have to go in and enter all of them manually?

As for the NABU empty titles and descriptions: I'd much rather we fix those properly. I've deliberately given them such an outstanding text to encourage collectors and operators to go in and fix them. They're of no use when created automatically. And they can't be found easily when they are empty. I can, however, ignore these values on the feed, if you prefer.

jangari commented 12 years ago

The feed (well, static file) I wrote worked for both collections and parties: http://azoulay.arts.usyd.edu.au/rif-cs/paradisec_rif-cs_harvest.xml

Ignoring those values seems like a bit of a workaround... Maybe we should start to think how we're going to not supply those records that aren't ready yet. Would we have a list of criteria fields, all of which have to be filled before a record is exported? A checkbox?

Maybe we should also have a way of producing the rif-cs of a single collection on the fly, just as we can do with imp.xml and id3.xml.

Sorry for adding more and more jobs each time.

LindaBarwick commented 12 years ago

we already have a 'hide metadata' field for items, maybe we need one for collections too.

I am hoping that Nick W and I will have the description and title fields sorted out in the old catalog by the end of this week.

FOR and geography are also on the list. FOR could wait till NABU is functional because it can be done with bulk edit

Linda Barwick linda.barwick@gmail.com

Running Sydney Half Marathon on 16 September to raise funds for ovarian cancer research - please donate via http://bsrf2012.gofundraise.com.au/page/LindaBarwick1

On 21/08/2012, at 11:37 AM, jangari wrote:

The feed (well, static file) I wrote worked for both collections and parties: http://azoulay.arts.usyd.edu.au/rif-cs/paradisec_rif-cs_harvest.xml

Ignoring those values seems like a bit of a workaround... Maybe we should start to think how we're going to not supply those records that aren't ready yet. Would we have a list of criteria fields, all of which have to be filled before a record is exported? A checkbox?

Maybe we should also have a way of producing the rif-cs of a single collection on the fly, just as we can do with imp.xml and id3.xml.

Sorry for adding more and more jobs each time.

— Reply to this email directly or view it on GitHub.

silviapfeiffer commented 12 years ago

Ah I see. You need to put registryObject around all parties individually. I'll see if I can fix the feed with those.

silviapfeiffer commented 12 years ago

You asked: Maybe we should also have a way of producing the rif-cs of a single collection on the fly.

I believe we would need to support what OAI-PMH calls "sets" for this: http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm#Set

However, I do not quite understand what that would be useful for, since OAI-PMH is built as a protocol to harvest the latest and updated data.

silviapfeiffer commented 12 years ago

As for the improved records:

FoR codes: we already have a field in the DB for that - where do you want to see that emerge in the feed?
temporal coverage dates: I think they are good enough as aggregates from the items - we may have some items that need fixing up though
spatial coverage: again this should be good enough as aggregate from the items and as soon as the empty ones are filled in
an identifier (check)
citation information (check; although I would want to have a chat about this; perhaps we can generate it dynamically from NABU's fields): it already is created dynamically from NABU's fields.

jangari commented 12 years ago

I'm thinking of situations in which we might want to upload the rif-cs manually and produce it from the catalogue to then be edited manually. But on reflection we probably don't want to do that, and just leave it entirely to the automatic feed.

FoR codes go in the same field as subjects, but with the attribute:

type="anzsrc-for"

silviapfeiffer commented 12 years ago

OK, so much for today. Check out : http://catalog.paradisec.org.au/oai/collection?verb=ListRecords&metadataPrefix=rif and the records at demo.ands and let me know how you want it changed. :-)

jangari commented 12 years ago

I've noticed that the related objects (for users) link back to the collections, but the code used points to the collector's key, not the collection. That is, in this party record:

<registryObject group="PARADISEC">
<key>paradisec.org.au/user/45</key>
<originatingSource type="authoritative">http://catalog.paradisec.org.au</originatingSource>
<party type="person" dateModified="2012-08-19T11:49:02+10:00">
<identifier type="local">paradisec.org.au/user/45</identifier>
<name type="primary">
<namePart type="given">Amanda</namePart>
<namePart type="family">Brotchie</namePart>
</name>
<location>
<address>
<electronic type="url">
<value>http://catalog.paradisec.org.au/admin/users/45</value>
</electronic>
<physical type="postalAddress">
<addressPart type="text">
Amanda Brotchie c/o PARADISEC, Department of Linguistics, The University of Sydney
</addressPart>
</physical>
</address>
</location>
<relatedObject>
<key>paradisec.org.au/user/45</key>
<relation type="isCollector"/>
</relatedObject>
</party>
</registryObject>

The related object should be paradisec.org.au/collections/AB1.

That said, we may get around this by simply telling ANDS to automatically generate backlinks from party records that are linked to from collections. Also, I'm told that Rif-cs overwrites when new records are supplied in the harvester, rather than values being appended. So if Alex Adelaar's party record is supplied three times, once for AA1, once for AA2 and once for AA3, it may be only the third that remains in the system as they would overwrite each other. But I suppose we can simply test this.

Also, the identifier field should contain:

http://catalog.paradisec.org.au/collections/[collection_ID]

At the moment they're still duplicates of the key field.

jangari commented 12 years ago

Yes, as I thought, records are being overwritten by newer ones, so we'll have to figure out a way to append values to an existing record in the harvest, and have a single record for every person, listing all of their collections as related objects, which may get complicated when someone adds a new collection under their name - NABU will have to look and find all their collections to accurately update their record in the feed.

The other option is not to supply related objects at all in party records, but let ANDS create reverse links. This is presumably more feasible but less optimal.

silviapfeiffer commented 12 years ago

Collections are being overwritten correctly. That's intentional, since we supply new information which replaces the old ones.

For parties that's more of a problem, since I'm creating the party records per collection. It would be nice to be able to specify "append" on the party records. Is it possible for ANDS to create the reverse links? I'd much prefer that.

BTW: it is already possible to extract information for just an individual identifier (i.e. collection). For example use: http://catalog.paradisec.org.au/oai/collection?verb=GetRecord&metadataPrefix=rif&identifier=oai:paradisec.org.au:AA2 to just get the data for collection AA2. OAI=PMH takes care of this.

jangari commented 12 years ago

Yes. So let's just remove all related objects from parties, both users and collections, and leave it to the system to do the backlinking.

Okay, that's good to know about being able to produce single records, just in case. I have no idea about how OAI works.

On another topic, I've had another look at the documentation regarding subject codes, and iso639-3 are not in the list as allowable subject types, but since we can force the type attribute to be this, I think we should go ahead with it. Hopefully one day ANDS will catch up to us and parse the info out of the codes themselves.

jangari commented 12 years ago

Just had an email convo with some people from ANDS, and the situation is that backlinks generated from the collection to the parties are completely fine. The only drawback being that data exported from RDA as OAI-PMH will not contain links from parties to collections; only the links that we supply at harvest.

Provided someone can search for, say, Arthur Capell in Trove and see that he has so many collections with Paradisec, and be able to click through to them, then I'm satisfied.

iso639-3 codes are fine. They have no problem with user-defined vocabularies for subject types.

silviapfeiffer commented 12 years ago

Do keep testing and reopen if you find you want any more changes to the feed.

jangari commented 12 years ago

'Identifier' field for all types of records should be the same as the electronic address, not the xml_key. I think there are 3 places to change this, line 195, line 279 and line 303.

From the looks of things, the string xml_key just needs to change to full_path, and the types made URIs. E.g.:

xml.identifier collector.xml_key, 'type' => 'local'

Should be:

xml.identifier collector.full_path, 'type' => 'uri'

silviapfeiffer commented 12 years ago

Hmm, I think that may not be appropriate. Those identifiers are used in ANDS to "identify" the individual objects. The full_path provides us with the whole URL including "http://catalog. etc etc". We determined earlier that this is not the way in which ANDS should have the objects, but the identifiers should be independent of the URL of the object and just simply unique identifiers.

jangari commented 12 years ago

I remember Simon Pockley advising this, in fact ideally, the electronic address would be a link to the actual data, and the identifier would be a link to more information on the data, i.e., the catalog. But that depends on how the archive is going to work re: access permissions and so on. I'll ask him about this.

I'm wondering about the way in which we supply party records. From the feed it looks as if a collection will produce its own collection record, as well as two party records (collector and university) and a PARADISEC activity record. Does this mean that if we update a user's details, then their party record won't get updated until a collection of theirs is updated? Or is the feed continuous, supplying everything in the database every day?

If so then I suppose it doesn't make a difference whether the party records are supplied as part of the collections or independently.

Universities: Here's what a university party record looks like:

<registryObject group="PARADISEC">
  <key>paradisec.org.au/university/10</key>
  <originatingSource type="authoritative">http://catalog.paradisec.org.au</originatingSource>
  <party type="group" dateModified="2012-08-26T12:49:30Z">
    <identifier type="local">paradisec.org.au/university/10</identifier>
    <name type="primary">
      <namePart type="primary">Melbourne</namePart>
    </name>
    <location>
      <address>
        <electronic type="url">
          <value>http://catalog.paradisec.org.au/admin/universities/10</value>
        </electronic>
        <physical type="streetAddress">
          <addressPart type="locationDescriptor">Melbourne</addressPart>
        </physical>
      </address>
    </location>
  </party>
</registryObject>

I think we should get rid of physical address, use the electronic address for the university website, and (when we have NLA identifiers for all universities, which will be an easy job) NLA identifiers in the identifier field. I don't think we necessarily need to link to our database for them.

Perhaps we should also get rid of postal addresses for collectors. I was thinking we should supply email addresses for collectors where we have them, but that might be a bit of a privacy issue; we'd have to get permission from the collectors to publicise their email address.

However, apart from those discussions, the feed looks great! I still need to go through with a fine-tooth comb, but so far so good.

jangari commented 12 years ago

Having spoken with one such collector, supplying email addresses with their party records is not a good idea.

jangari commented 12 years ago

Hmm, I think that may not be appropriate. Those identifiers are used in ANDS to "identify" the individual objects. The full_path provides us with the whole URL including "http://catalog. etc etc". We determined earlier that this is not the way in which ANDS should have the objects, but the identifiers should be independent of the URL of the object and just simply unique identifiers.

It sounds like you're referring to keys, not identifiers. Yes, keys are exactly as I want them: "paradisec.org.au/collection/[ID]", but the identifier field is an optional field for linking the metadata in the record with other databases, i.e. trove, or the paradisec catalogue. So this is the field to be used for linking party records to trove NLA records, institutions too, and we should use it to link to our database.

The electronic address, ideally speaking, should point directly to the collection, i.e., the archive itself, but since we haven't got it yet, we're linking to the next best thing, the catalogue.

That is, in future, we should have this:

key: paradisec.org.au/collection/ABW1 Identifier (URI): http://catalog.paradisec.org.au/collections/ABW1 Electronic Address (URL): http://repository.paradisec.org.au/collection/ABW1 (or whatever the path will be)

Until we can sort out what the archive address will look like (@LindaBarwick? @nthieberger?), the electronic address is just the same as the catalogue address.

jangari commented 12 years ago

ANDS is removing date type UTC from their supported types, and leaving it only with W3CDTF, which from all my looking at it, looks identical in every way. So we could just change UTC to W3CDTF in all date type attributes. In fact for dateFrom and dateTo, we can probably leave it was YYYY-MM-DD and get rid of the time and timezone. I can't imaging we'll have a collection that actually has this information. Individual files might, especially these days with born digital files, but not at the collection level.

silviapfeiffer commented 12 years ago

Have updated the xml.identifier fields
Have replaced UTC with W3CDTF; I'm leaving the date format since it's the appropriate one for xmlschema
I can't use the electronic address of the universities, since we don't store that in the DB. Would you like me to add a field for this? If so, could you please open a new bug for this? This one is getting crowded. ;-)
What is the electronic address that you are referring to, e.g. http://repository.paradisec.org.au/collection/ABW1 ? Did you want that to point to the data files themselves? And do we really want to expose this? It will make it easy for people to download the files.

jangari commented 12 years ago

Thanks!

Re: electronic address of universities, I don't think it's particularly important, especially since most, if not all, it appears, have NLA records to link to.

Re: electronic address of collections, well yeah, this is something we'd have to discuss when the repository is online. Will pointing people directly to it (which is what ANDS are pushing toward, or as close as possible) then what are they displayed with? A download link? Or a page telling them that they don't have permission to view this page? If the latter then that's far worse than linking only to the catalog. But if there's a display of the contents and a display of certain metadata and instructions on how to gain access, then that would be optimal.

Do we know when we'll have the repository up and running? Who is even taking care of that?

LindaBarwick commented 12 years ago

Is there any reason not to have the electronic address of the collections be the catalog page?

jangari commented 12 years ago

The only reason is that ANDS have designed rif-cs such that the electronic address field is the location of the actual data, rather than a link to other information about the data. The catalog isn't the actual data although it provides a link through. So what we currently have is fine, but as far as ANDS is concerned it's slightly less than perfect. We may want to prevent people going from ANDS directly to the repository without going through the catalog first, in which case we leave it as it is. I'm just throwing up possibilities.

paradisec-archive / nabu

Implement RIF-CS export #164