Closed ScottConroy closed 12 years ago
I put in place an easy way to normalize last modification metadata metadata into a corona:modDate element. Am also running anything that looks like a date through the date parser to (hopefully) get out xs:dateTime values. Typos also cleaned up.
Thanks!
Your solution is obviously MUCH more graceful than mine! I very much appreciate the fast turnaround. I'll be putting this to use right away.
Any thoughts about making this usable outside of Corona? I think folks that are using other mechanisms to load binary content would benefit greatly from this. The default result of an xdmp:document-filter doesn't really cut it...
On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm < reply@reply.github.com
wrote:
I put in place an easy way to normalize last modification metadata metadata into a corona:modDate element. Am also running anything that looks like a date through the date parser to (hopefully) get out xs:dateTime values. Typos also cleaned up.
Thanks!
Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3165927
I'm getting an invalid cast as dateTime when I attempt to upload PDF's. Tried with more than one. I didn't check your parsing since I know you can do it faster than I can. Here's an example doc.
On Thu, Dec 15, 2011 at 12:43 PM, Scott Conroy conroys@avalonconsult.comwrote:
Your solution is obviously MUCH more graceful than mine! I very much appreciate the fast turnaround. I'll be putting this to use right away.
Any thoughts about making this usable outside of Corona? I think folks that are using other mechanisms to load binary content would benefit greatly from this. The default result of an xdmp:document-filter doesn't really cut it...
On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm < reply@reply.github.com
wrote:
I put in place an easy way to normalize last modification metadata metadata into a corona:modDate element. Am also running anything that looks like a date through the date parser to (hopefully) get out xs:dateTime values. Typos also cleaned up.
Thanks!
Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3165927
I noticed a couple more formats that the date parsing library wasn't handling and added those.
I suspect that the problem is in your range index. Is this an index that you created via Corona or the MarkLogic admin interface?
I'm putting the parsed date into a normalized-date attribute and leaving the original content as a text node. So make sure that the range index is pointing to the attribute and let me know if that gives you some success.
--Ryan
On Dec 15, 2011, at 10:19 AM, Scott Conroy wrote:
Forgot to mention that I have an index on modDate. Obviously the upload works if I get rid of the index. But as you can guess I'm trying to facet on modDate (across a variety of content).
On Thu, Dec 15, 2011 at 1:12 PM, Ryan Grimm wrote: Doesn't look like Git allows attachments. Feel free to email me the PDF directly and I'll fix it up.
--Ryan
On Dec 15, 2011, at 10:07 AM, Scott Conroy wrote:
I'm getting an invalid cast as dateTime when I attempt to upload PDF's. Tried with more than one. I didn't check your parsing since I know you can do it faster than I can. Here's an example doc.
On Thu, Dec 15, 2011 at 12:43 PM, Scott Conroy conroys@avalonconsult.comwrote:
Your solution is obviously MUCH more graceful than mine! I very much appreciate the fast turnaround. I'll be putting this to use right away.
Any thoughts about making this usable outside of Corona? I think folks that are using other mechanisms to load binary content would benefit greatly from this. The default result of an xdmp:document-filter doesn't really cut it...
On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm < reply@reply.github.com
wrote:
I put in place an easy way to normalize last modification metadata metadata into a corona:modDate element. Am also running anything that looks like a date through the date parser to (hopefully) get out xs:dateTime values. Typos also cleaned up.
Thanks!
Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3165927
Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3166324
Sorry, I just figured that out while you were emailing me. Much appreciated.
On Thu, Dec 15, 2011 at 1:33 PM, Ryan Grimm < reply@reply.github.com
wrote:
I noticed a couple more formats that the date parsing library wasn't handling and added those.
I suspect that the problem is in your range index. Is this an index that you created via Corona or the MarkLogic admin interface?
I'm putting the parsed date into a normalized-date attribute and leaving the original content as a text node. So make sure that the range index is pointing to the attribute and let me know if that gives you some success.
--Ryan
On Dec 15, 2011, at 10:19 AM, Scott Conroy wrote:
Forgot to mention that I have an index on modDate. Obviously the upload works if I get rid of the index. But as you can guess I'm trying to facet on modDate (across a variety of content).
On Thu, Dec 15, 2011 at 1:12 PM, Ryan Grimm wrote: Doesn't look like Git allows attachments. Feel free to email me the PDF directly and I'll fix it up.
--Ryan
On Dec 15, 2011, at 10:07 AM, Scott Conroy wrote:
I'm getting an invalid cast as dateTime when I attempt to upload PDF's. Tried with more than one. I didn't check your parsing since I know you can do it faster than I can. Here's an example doc.
On Thu, Dec 15, 2011 at 12:43 PM, Scott Conroy < conroys@avalonconsult.com>wrote:
Your solution is obviously MUCH more graceful than mine! I very much appreciate the fast turnaround. I'll be putting this to use right away.
Any thoughts about making this usable outside of Corona? I think folks that are using other mechanisms to load binary content would benefit greatly from this. The default result of an xdmp:document-filter doesn't really cut it...
On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm < reply@reply.github.com
wrote:
I put in place an easy way to normalize last modification metadata metadata into a corona:modDate element. Am also running anything that looks like a date through the date parser to (hopefully) get out xs:dateTime values. Typos also cleaned up.
Thanks!
Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3165927
Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3166324
Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3166698
No worries.
I just created a new issue (#74) to make it easier to create range indexes on binary metadata without knowing all of the details.
--Ryan
On Dec 15, 2011, at 10:44 AM, Scott Conroy wrote:
Sorry, I just figured that out while you were emailing me. Much appreciated.
On Thu, Dec 15, 2011 at 1:33 PM, Ryan Grimm < reply@reply.github.com
wrote:
I noticed a couple more formats that the date parsing library wasn't handling and added those.
I suspect that the problem is in your range index. Is this an index that you created via Corona or the MarkLogic admin interface?
I'm putting the parsed date into a normalized-date attribute and leaving the original content as a text node. So make sure that the range index is pointing to the attribute and let me know if that gives you some success.
--Ryan
On Dec 15, 2011, at 10:19 AM, Scott Conroy wrote:
Forgot to mention that I have an index on modDate. Obviously the upload works if I get rid of the index. But as you can guess I'm trying to facet on modDate (across a variety of content).
On Thu, Dec 15, 2011 at 1:12 PM, Ryan Grimm wrote: Doesn't look like Git allows attachments. Feel free to email me the PDF directly and I'll fix it up.
--Ryan
On Dec 15, 2011, at 10:07 AM, Scott Conroy wrote:
I'm getting an invalid cast as dateTime when I attempt to upload PDF's. Tried with more than one. I didn't check your parsing since I know you can do it faster than I can. Here's an example doc.
On Thu, Dec 15, 2011 at 12:43 PM, Scott Conroy < conroys@avalonconsult.com>wrote:
Your solution is obviously MUCH more graceful than mine! I very much appreciate the fast turnaround. I'll be putting this to use right away.
Any thoughts about making this usable outside of Corona? I think folks that are using other mechanisms to load binary content would benefit greatly from this. The default result of an xdmp:document-filter doesn't really cut it...
On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm < reply@reply.github.com
wrote:
I put in place an easy way to normalize last modification metadata metadata into a corona:modDate element. Am also running anything that looks like a date through the date parser to (hopefully) get out xs:dateTime values. Typos also cleaned up.
Thanks!
Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3165927
Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3166324
Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3166698
Reply to this email directly or view it on GitHub: https://github.com/marklogic/Corona/pull/72#issuecomment-3166842
I'm finding several variations on the creation and last modification dates on different file types, but this fix works for the specific PDFs I have on hand.