xiaoyangren / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

FileSetCollectionReaderBase sets collectionId wrong? #52

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The documentBaseUri and collectionId seem to be out of sync in 
FileSetCollectionReaderBase:

            // Set the document metadata
            DocumentMetaData docMetaData = new DocumentMetaData(aCas.getJCas());
            File file = aFile.getFile();
            docMetaData.setDocumentTitle(file.getName());
            docMetaData.setDocumentUri(file.toURI().toString());
            docMetaData.setDocumentId(aFile.getName());
            if (aFile.getBaseDir() != null) {
                docMetaData.setDocumentBaseUri(path.toURI().toString());
                docMetaData.setCollectionId(aFile.getBaseDir().getPath());
            }

I suppose the collectionId should resemble the documentBaseUri here.

Original issue reported on code.google.com by richard.eckart on 12 Apr 2012 at 10:16

GoogleCodeExporter commented 9 years ago
This is how ResourceCollectionReaderBase does it.

        String qualifier = aQualifier != null ? "#"+aQualifier : "";
        // Set the document metadata
        DocumentMetaData docMetaData = new DocumentMetaData(aCas.getJCas());
        docMetaData.setDocumentTitle(new File(aResource.getPath()).getName());
        docMetaData.setDocumentUri(aResource.getResolvedUri().toString()+qualifier);
        docMetaData.setDocumentId(aResource.getPath()+qualifier);
        if (aResource.getBase() != null) {
            docMetaData.setDocumentBaseUri(aResource.getResolvedBase());
            docMetaData.setCollectionId(aResource.getResolvedBase()+qualifier);
        }

It also looks strange here that the qualifier is added to the collectionId as 
well as to the documentId. It should only be added to the documentId I think. 

And FileSetCollectionReaderBase should be changed to use "path" as the 
collectionId I suppose.

Changing this could break existing user code though.

Original comment by richard.eckart on 12 Apr 2012 at 10:20

GoogleCodeExporter commented 9 years ago
Removed qualifier from collectionId" /Users/bluefire/UKP/Workspaces/dkpro
---
Committed revision 648.

Original comment by richard.eckart on 12 May 2012 at 12:56