marklogic-community / Corona

Community REST API for MarkLogic
Other
37 stars 9 forks source link

Fixed $extratInfo typo and added a brute fix for last modification date on PDFs #72

Closed ScottConroy closed 12 years ago

ScottConroy commented 12 years ago

I'm finding several variations on the creation and last modification dates on different file types, but this fix works for the specific PDFs I have on hand.

ryangrimm commented 12 years ago

I put in place an easy way to normalize last modification metadata metadata into a corona:modDate element. Am also running anything that looks like a date through the date parser to (hopefully) get out xs:dateTime values. Typos also cleaned up.

Thanks!

ScottConroy commented 12 years ago

Your solution is obviously MUCH more graceful than mine! I very much appreciate the fast turnaround. I'll be putting this to use right away.

Any thoughts about making this usable outside of Corona? I think folks that are using other mechanisms to load binary content would benefit greatly from this. The default result of an xdmp:document-filter doesn't really cut it...

On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm < reply@reply.github.com

wrote:

I put in place an easy way to normalize last modification metadata metadata into a corona:modDate element. Am also running anything that looks like a date through the date parser to (hopefully) get out xs:dateTime values. Typos also cleaned up.

Thanks!


Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3165927

ScottConroy commented 12 years ago

I'm getting an invalid cast as dateTime when I attempt to upload PDF's. Tried with more than one. I didn't check your parsing since I know you can do it faster than I can. Here's an example doc.

On Thu, Dec 15, 2011 at 12:43 PM, Scott Conroy conroys@avalonconsult.comwrote:

Your solution is obviously MUCH more graceful than mine! I very much appreciate the fast turnaround. I'll be putting this to use right away.

Any thoughts about making this usable outside of Corona? I think folks that are using other mechanisms to load binary content would benefit greatly from this. The default result of an xdmp:document-filter doesn't really cut it...

On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm < reply@reply.github.com

wrote:

I put in place an easy way to normalize last modification metadata metadata into a corona:modDate element. Am also running anything that looks like a date through the date parser to (hopefully) get out xs:dateTime values. Typos also cleaned up.

Thanks!


Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3165927

ryangrimm commented 12 years ago

I noticed a couple more formats that the date parsing library wasn't handling and added those.

I suspect that the problem is in your range index. Is this an index that you created via Corona or the MarkLogic admin interface?

I'm putting the parsed date into a normalized-date attribute and leaving the original content as a text node. So make sure that the range index is pointing to the attribute and let me know if that gives you some success.

--Ryan

On Dec 15, 2011, at 10:19 AM, Scott Conroy wrote:

Forgot to mention that I have an index on modDate. Obviously the upload works if I get rid of the index. But as you can guess I'm trying to facet on modDate (across a variety of content).

On Thu, Dec 15, 2011 at 1:12 PM, Ryan Grimm wrote: Doesn't look like Git allows attachments. Feel free to email me the PDF directly and I'll fix it up.

--Ryan

On Dec 15, 2011, at 10:07 AM, Scott Conroy wrote:

I'm getting an invalid cast as dateTime when I attempt to upload PDF's. Tried with more than one. I didn't check your parsing since I know you can do it faster than I can. Here's an example doc.

On Thu, Dec 15, 2011 at 12:43 PM, Scott Conroy conroys@avalonconsult.comwrote:

Your solution is obviously MUCH more graceful than mine! I very much appreciate the fast turnaround. I'll be putting this to use right away.

Any thoughts about making this usable outside of Corona? I think folks that are using other mechanisms to load binary content would benefit greatly from this. The default result of an xdmp:document-filter doesn't really cut it...

On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm < reply@reply.github.com

wrote:

I put in place an easy way to normalize last modification metadata metadata into a corona:modDate element. Am also running anything that looks like a date through the date parser to (hopefully) get out xs:dateTime values. Typos also cleaned up.

Thanks!


Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3165927


Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3166324

ScottConroy commented 12 years ago

Sorry, I just figured that out while you were emailing me. Much appreciated.

On Thu, Dec 15, 2011 at 1:33 PM, Ryan Grimm < reply@reply.github.com

wrote:

I noticed a couple more formats that the date parsing library wasn't handling and added those.

I suspect that the problem is in your range index. Is this an index that you created via Corona or the MarkLogic admin interface?

I'm putting the parsed date into a normalized-date attribute and leaving the original content as a text node. So make sure that the range index is pointing to the attribute and let me know if that gives you some success.

--Ryan

On Dec 15, 2011, at 10:19 AM, Scott Conroy wrote:

Forgot to mention that I have an index on modDate. Obviously the upload works if I get rid of the index. But as you can guess I'm trying to facet on modDate (across a variety of content).

On Thu, Dec 15, 2011 at 1:12 PM, Ryan Grimm wrote: Doesn't look like Git allows attachments. Feel free to email me the PDF directly and I'll fix it up.

--Ryan

On Dec 15, 2011, at 10:07 AM, Scott Conroy wrote:

I'm getting an invalid cast as dateTime when I attempt to upload PDF's. Tried with more than one. I didn't check your parsing since I know you can do it faster than I can. Here's an example doc.

On Thu, Dec 15, 2011 at 12:43 PM, Scott Conroy < conroys@avalonconsult.com>wrote:

Your solution is obviously MUCH more graceful than mine! I very much appreciate the fast turnaround. I'll be putting this to use right away.

Any thoughts about making this usable outside of Corona? I think folks that are using other mechanisms to load binary content would benefit greatly from this. The default result of an xdmp:document-filter doesn't really cut it...

On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm < reply@reply.github.com

wrote:

I put in place an easy way to normalize last modification metadata metadata into a corona:modDate element. Am also running anything that looks like a date through the date parser to (hopefully) get out xs:dateTime values. Typos also cleaned up.

Thanks!


Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3165927


Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3166324


Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3166698

ryangrimm commented 12 years ago

No worries.

I just created a new issue (#74) to make it easier to create range indexes on binary metadata without knowing all of the details.

--Ryan

On Dec 15, 2011, at 10:44 AM, Scott Conroy wrote:

Sorry, I just figured that out while you were emailing me. Much appreciated.

On Thu, Dec 15, 2011 at 1:33 PM, Ryan Grimm < reply@reply.github.com

wrote:

I noticed a couple more formats that the date parsing library wasn't handling and added those.

I suspect that the problem is in your range index. Is this an index that you created via Corona or the MarkLogic admin interface?

I'm putting the parsed date into a normalized-date attribute and leaving the original content as a text node. So make sure that the range index is pointing to the attribute and let me know if that gives you some success.

--Ryan

On Dec 15, 2011, at 10:19 AM, Scott Conroy wrote:

Forgot to mention that I have an index on modDate. Obviously the upload works if I get rid of the index. But as you can guess I'm trying to facet on modDate (across a variety of content).

On Thu, Dec 15, 2011 at 1:12 PM, Ryan Grimm wrote: Doesn't look like Git allows attachments. Feel free to email me the PDF directly and I'll fix it up.

--Ryan

On Dec 15, 2011, at 10:07 AM, Scott Conroy wrote:

I'm getting an invalid cast as dateTime when I attempt to upload PDF's. Tried with more than one. I didn't check your parsing since I know you can do it faster than I can. Here's an example doc.

On Thu, Dec 15, 2011 at 12:43 PM, Scott Conroy < conroys@avalonconsult.com>wrote:

Your solution is obviously MUCH more graceful than mine! I very much appreciate the fast turnaround. I'll be putting this to use right away.

Any thoughts about making this usable outside of Corona? I think folks that are using other mechanisms to load binary content would benefit greatly from this. The default result of an xdmp:document-filter doesn't really cut it...

On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm < reply@reply.github.com

wrote:

I put in place an easy way to normalize last modification metadata metadata into a corona:modDate element. Am also running anything that looks like a date through the date parser to (hopefully) get out xs:dateTime values. Typos also cleaned up.

Thanks!


Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3165927


Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3166324


Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3166698


Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3166842