iotashan / cfsolrlib

ColdFusion library for advanced Solr integration
MIT License
30 stars 16 forks source link

addFile needs the ability to override the metadata #1

Closed seancoyne closed 13 years ago

seancoyne commented 13 years ago

If I want to add a file to the index, but not be limited to the metadata in the file, there is currently no way to override that data. For example, a PDF file might have a "title" metadata field. However, I may want to use a different title for that file rather than the one stored in the metadata. Many users are unaware of the metadata fields in Word Docs and PDFs and leave them as the default. If they are uploading a file to a content management system, they may assign a different title, summary, etc and would expect that data to be indexed.

I suggest that another argument be added to the addFile method that accepts an array of structs that override the metadata values.

iotashan commented 13 years ago

I agree. Want to add it and submit a merge request? :)

seancoyne commented 13 years ago

I'll fork and send you a pull request when its ready :) We can close this issue until then. Thanks!

seancoyne commented 13 years ago

OK, so this is much more difficult than I first imagined. I have managed to get it to accept custom metadata using the "literal.fieldname=value" parameters as described here: http://wiki.apache.org/solr/ExtractingRequestHandler but if I try to use a field name that matches a field returned by Tika, for example, "Title" Solr throws an error that I am trying to provide multiple values for a field that is not multiValue. Its trying to use both the Title returned by Tika and the Title I am providing. I even tried mapping the field from Tika to another field using the fmap parameters and it still doesn't like it. I will have to rethink this.

iotashan commented 13 years ago

Ouch. I'll take a peek as soon as I can.

iotashan commented 13 years ago

Looks like this is a Solr issue that I can't do anything about in this library...

http://lucene.472066.n3.nabble.com/Controlling-Tika-s-metadata-td2378677.html

seancoyne commented 13 years ago

Hmm, that may cause some issues, but I'm sure we can find a workaround. Thanks for looking into it.

iotashan commented 13 years ago

The workaround is don't name your fields in your schema with common names... or prefix them.

"myTitle" instead of "title", etc.