groonga / gcs

Groonga CloudSearch is an open source implementation of Amazon CloudSearch.
http://gcs.groonga.org/
MIT License
25 stars 1 forks source link

Research specifications of Amazon CloudSearch and define initial milestones of croonga project #1

Closed piroor closed 12 years ago

piroor commented 12 years ago

Current state

No milestone.

Expected state

There are clear milestones. They are small enough to release the initial version for people rapidly.

How to solve

  1. Research the documentation of Amazon CloudSearch, to make them clear; how to upload data and select records.
  2. Define minimal milestones.
piroor commented 12 years ago

ACS accepts "Search Data Format (SDF)" as the input. http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/SvcConcepts.SDF.html

SDF data can be sent as JSON or XML.

People send the data to the web API via POST method (or the console).

http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/senddata.html

This can be mapped to Groonga's load command.

piroor commented 12 years ago

Before uploading data, we have to configure indexes of fields. http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/configureindexfields.API.html This can be mapped to Groonga's create_index command.

piroor commented 12 years ago

Before creating indexes, we have to create domains. http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/creatingdomains.html#createdomain.API This can be mapped to Grooga's create_table command.

piroor commented 12 years ago

To search in stored data, we have to call Web API by GET method. http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/searching.html This can be mapped to Groonga's select command.

piroor commented 12 years ago

Draft of the initial milestone:

Dropped topics from the initial milestone:

piroor commented 12 years ago

The search API has following features:

piroor commented 12 years ago

Draft v2 of the initial milestone:

Dropped topics from the initial milestone:


Draft of the 2nd milestone:

piroor commented 12 years ago

SDF can have its type from two of types: add and delete. http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/DocumentsBatch.JSON.html#DocumentsBatch.JSON.Requests

piroor commented 12 years ago

Draft v3 of the initial release:

Dropped topics from the initial release:


milestone 1:

milestone 2:

milestone 3:

piroor commented 12 years ago

We should support CreateDomain and ConfigureIndexFields APIs on the initial release.

piroor commented 12 years ago

Draft v4 of the initial release:

Dropped topics from the initial release:


milestone 1:

milestone 2:

milestone 3:

milestone 4:

milestone 5:

milestone 6:

milestone 7:

piroor commented 12 years ago

milestone 7:

  • Create the database automatically via Configuration API, Action=CreateDomain.

This has been already done.

piroor commented 12 years ago

http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/creatingdomains.html

Restrictions about search domains (tables):

piroor commented 12 years ago

http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/configureindexfields.html

Restrictions about column names (index fields):

piroor commented 12 years ago

For the initial release, I think we should define a model case. My plan:

Based on such a model case, we can define a schemer for Groonga and develop two APIs batch and search parallely.

piroor commented 12 years ago

http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/configreq.html

ACS accepts both GET and POST methods for the configuration API. On the other hand, search API accepts only GET, and batch API accepts only POST.

piroor commented 12 years ago

http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/creatingsdf.html

Restrictions of batches:

piroor commented 12 years ago

http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/addsourcefield.html

When you create a new index, you can specify the "source" of the index. "Copy" is just same to groonga's default behavior. Others ("Trim Title" and "Map") are like "stored procedure"s. On the initial release, I think we should drop support of "Trim Title" and "Map".

piroor commented 12 years ago

http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/indexing.html

ACS can rebuild search indexes with new configurations, with documents already updated. In other words, ACS stores the original document separately from the indexed database. This is very different from groonga.

piroor commented 12 years ago

Saved as a wiki page. Roadmap => Initial Release