kapilt / contentmirror

Automatically exported from code.google.com/p/contentmirror
1 stars 0 forks source link

Enhancement: add simplified path resolution #15

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

What is the expected output? What do you see instead?
It would be useful for many end uses of CM to be able to easily query the
database for hierarchy information without having to resort to traversing
objects in memory via the parent/child relationships. One popular
implementation is the modified preorder tree traversal algorithm which uses
additional fields in the content table to manage an item's position in the
content tree. Another option is to include a single string field including
the relative path to the plone root.(e.g. /path/to/content )

On database targets supporting triggers, either mechanism can be
implemented as an external bolt on if desired by end users. In the absence
of triggers however,  this gets difficult.

Please provide any additional information below.

Original issue reported on code.google.com by bry...@pdq.net on 25 Aug 2008 at 2:49

GoogleCodeExporter commented 9 years ago
sqlalchemy makes possible eagerly fetching the objects in a containment 
heirarchy via
specification of join depth on the mapper for good performance in producing 
hierarchy
displays. To avoid any unesc field fetches and polymorphic resolution for a 
site map
type display, i'd use a separate/secondary mapper and domain class. this usage 
should
get documented. database level tree support has some variance, and dependence 
on db
PL for triggers has portablity issues.

Original comment by kapilt@gmail.com on 4 Sep 2008 at 2:16

GoogleCodeExporter commented 9 years ago
I have a sample implementation that does such a thing to build the site map. 
However, this only addresses half of the issue. The other part is containment 
based
searches such as finding all Documents beneath a specific container and any 
number of
sub-containers. Pre-computing a folder heirarchy can allow for  post-query 
filtering
of such searches, but it could get computationally expensive to do the 
traversals.

Original comment by bry...@pdq.net on 4 Sep 2008 at 3:17

GoogleCodeExporter commented 9 years ago

Original comment by runy...@gmail.com on 4 Sep 2008 at 5:02

GoogleCodeExporter commented 9 years ago

Original comment by runy...@gmail.com on 4 Sep 2008 at 5:06

GoogleCodeExporter commented 9 years ago
could you elaborate on the use cases for this ? if you want all documents under 
a
container, can't you just use the secondary site node class mapper with a join 
depth
sufficiently deep to cover querying the subtree?

i think any approach which incorporates modified preorder traversal will have 
to wait
for an async mirror implementation, the tree modification costs have 
concurrency and
time constraints that aren't appropriate for a synchronous serialization.

realistically i wonder if this wouldn't be better outside of the database, doing
search related activities ( even hierarchy ) seems better suited to including 
xapian
or solr integration for fts.

Original comment by kapilt@gmail.com on 13 Sep 2008 at 11:19

GoogleCodeExporter commented 9 years ago
The use case I have is finding all content beneath a particular container, so in
addition to content inside of the container in question, I need content that is 
at
any point in the containment heirarchy from that point down to any possible 
depth.
This is used for listing recent content added in the containment heirarchy such 
as
rss feeds or other recent content displays. As a result, the search would also 
need
to be sorted on creation_date and filtered on particular portal types.

Join depth could get difficult or inefficient with depths of over 5 or 6 and 
could
get really bad at depths of over 20 as the generated sql could get quite large 
and
potentially require parameter adjustment on some db engines defaults to support
really enormous sql.

Adding an attribute to the content that is the relative path (e.g. 
/path/to/content)
allows for a constant time attribute that would allow for the use case I'm 
dealing with.

I think that one of the benefits of using a relational database is the ability 
to
perform searches using all of the features of standard sql.  If we want to move 
all
search functionality to an outside search engine, then it might not be worth
exporting to a database, but rather a direct filesystem dump.

Original comment by bry...@pdq.net on 13 Sep 2008 at 11:39

GoogleCodeExporter commented 9 years ago
agreed re limitations on join depth. adding in a string attribute seems like 
the most
expedient thing in the mean time, to allow for this functionality.

so a new field for the content table, 'path' consisting of relative portal 
path, and
structured as a string.Text field, i'm tempted put a max on it, but it seems
arbitrary... any suggestions? 

i'm interested in an fts solution as well, mostly because in my experience the 
db
solutions for this are a bit lacking (mysql/pg at least) and it gives additional
capabilities for context free search across the content store. i'll add a 
separate
ticket for that though.

Original comment by kapilt@gmail.com on 14 Sep 2008 at 12:19

GoogleCodeExporter commented 9 years ago
committed in revision 56, full physical path in the zodb is stored in the 
content
table now under the path column.

Original comment by kapilt@gmail.com on 14 Sep 2008 at 12:51

GoogleCodeExporter commented 9 years ago

Original comment by kapilt@gmail.com on 14 Sep 2008 at 12:52