Fixed couple bugs - Githubissues

mhradek commented 12 years ago

The URLs have a dash deperating the domain and region, not a period. The regex check in batcher will always throw for valid JSON and it is redundant so removed it.

Thanks!

javmorin commented 12 years ago

@mhradek Thanks for your pull request.

As @djensen47 mentioned, the domain structure for CloudSearch in the current code reflects what we currently see in our production CloudSearch deployments. I would be very surprised if Amazon decided to change the domain pattern, as the instance.region.service.amazonaws.com pattern is used throughout the AWS system (ELB being another noted example). If we do see different domain structures appear, then we will need to (obviously) adjust for that, but it will be via a switch to complete url specification rather than changing the existing "fill-in-the-blank" pattern.

Have you encountered such search urls in your CloudSearch deployments?

mhradek commented 12 years ago

Thanks for checking out my request.

Yes, in my case the requests use the - pattern hence the checkin. This is what we're using in our deployments.

Also, in my case, usage was failing because of the JSON which was valid but throwing. For giggles I ran the regex on the JSON and it removed all the slashes and double quotes making it invalid JSON. I blame that on my development environment not matching yours. Even your tests, as in the library, were failing.

I see your point about the regex needed to be there.

Anyway, thanks for your time.

mhradek

On 5/29/2012 5:48 PM, Mike Javorski wrote:

@mhradek Thanks for your pull request.

As @djensen47 mentioned, the domain structure for CloudSearch in the current code reflects what we currently see in our production CloudSearch deployments. I would be very surprised if Amazon decided to change the domain pattern, as the...amazonaws.com pattern is used throughout the AWS system (ELB being another noted example). If we do see different domain structures appear, then we will need to (obviously) adjust for that, but it will be via a switch to complete url specification rather than changing the existing "fill-in-the-blank" pattern.

Have you encountered such search urls in your CloudSearch deployments?

To your second point on the regex check on the JSON data; this was added in direct response to errors we encountered while testing. We had data which included non-XML 1.0 compliant (yet still JSON compliant) text, and the CloudSearch system rejected it with errors. We confirmed directly with AWS staff that the content of even the JSON data needed to comply with the XML 1.0 permitted character list. My personal suspicion is that they are converting everything into XML, post-upload so that they only have to support a single data format on the search servers. We requested at the time that the CloudSearch docs be updated to include this limitation, but that appears to have been lost somewhere. We'll have to ping them again.

I am going to close this pull request now. We welcome your further future contributions.

Reply to this email directly or view it on GitHub: https://github.com/spokesoftware/aws_cloud_search/pull/3#issuecomment-5999774

djensen47 commented 12 years ago

Can you provide a sample of the urls that your environment is using? If AWS changed something, maybe we need to make the change to use the entire URL (not just the domain) sooner.

Also, can you provide the JSON that was throwing an exception.

Finally, are you on Linux, Mac, or Windows?

spokesoftware / aws_cloud_search

Fixed couple bugs #3