philipmeadows / alfresco-webscript-manifold-connector

Alfresco Solr API Repository Connector for Apache ManifoldCF
11 stars 11 forks source link

Manging ACL in MCF connector? #11

Open lalitjangra opened 10 years ago

lalitjangra commented 10 years ago

Hi Maurizio, I was looking for integrating alfresco with Apache MCF 1.5.1 using CMIS connector. I was able to run it & getting all content metadata into solr index apart from ACL permissions.

Then i realized ACL indexing & storage is not supported in CMIS MCF connector so i moved to MCF Alfresco connector. But here also i could not see how to manage ACL indexing & storage.

Can you confirm if Alfresco connector supports ACL storage & indexing? If no, any way to achieve the same?

Regards.

maoo commented 10 years ago

Hi, the latest implementation of alfresco mabifold connector indexes read acls within the document payload, attaching the list of authorities (users and groups) that have read access to that doc.

It also provides a webscript that - given a username - resolves the list of alfresco related authorities that can be used - at query time - to add a query clause to filter out unaccessible doc reaults.

Hope this helps, please let me know if you need further info.

lalitjangra commented 10 years ago

Thanks a lot Maurizio,

I can see alfresco connector in Apache ManifodCF source, so is it same as your connector available @ https://github.com/maoo/alfresco-webscript-manifold-connector.

Or i need to get your connector from https://github.com/maoo/alfresco-webscript-manifold-connector and put it into ManifoldCF.

If it is already in ManifoldCF distribution, which particular class should i look for? Also i am interested in webscript implementation.

Also as i am using ManifoldCF 1.5.1, will this connector work for 1.5.1 version with Solr 4.6.0?

Regards.

On Wed, Jun 11, 2014 at 11:46 AM, Maurizio Pillitu <notifications@github.com

wrote:

Hi, the latest implementation of alfresco mabifold connector indexes read acls within the document payload, attaching the list of authorities (users and groups) that have read access to that doc.

It also provides a webscript that - given a username - resolves the list of alfresco related authorities that can be used - at query time - to add a query clause to filter out unaccessible doc reaults.

Hope this helps, please let me know if you need further info.

Sent from a phone, apologies fpr typos

On 11 Jun 2014, at 12:16, lalitjangra notifications@github.com wrote:

Hi Maurizio, I was looking for integrating alfresco with Apache MCF 1.5.1 using CMIS connector. I was able to run it & getting all content metadata into solr index apart from ACL permissions.

Then i realized ACL indexing & storage is not supported in CMIS MCF connector so i moved to MCF Alfresco connector. But here also i could not see how to manage ACL indexing & storage.

Can you confirm if Alfresco connector supports ACL storage & indexing? If no, any way to achieve the same?

Regards.

— Reply to this email directly or view it on GitHub https://github.com/maoo/alfresco-webscript-manifold-connector/issues/11.

— Reply to this email directly or view it on GitHub https://github.com/maoo/alfresco-webscript-manifold-connector/issues/11#issuecomment-45726526 .

Regards, Lalit Jangra.

maoo commented 10 years ago

The alfresco connector code in the Manifold trunk is not the same as the one on github (i should clarify this on the docs, or feel free to change it and pull request), as it relies on alfresco lucene engine, whereas this implementation provides custom webscripts that fetches docs and acls.

If youre interested in webscripts code, have a look here

https://github.com/maoo/alfresco-webscript-manifold-connector/tree/master/alfresco-indexer-webscripts/src/main/java/org/alfresco/consulting/indexer/webscripts

Urls are registered with the desc.xml files

https://github.com/maoo/alfresco-webscript-manifold-connector/tree/master/alfresco-indexer-webscripts/src/main/amp/config/alfresco/extension/templates/webscripts/org/alfresco/consulting/indexer/webscripts

lalitjangra commented 10 years ago

Thanks Maurizio,

I was just wondering how alfresco does the same thing means how alfresco repository and solr keep track of index and how a query from alfresco gets executed in solr with ACL from both sides getting compared (missing part of this puzzle).

I have noticed SOLRAPIClient.java in alfresco source which calls below webscripts to get information about nodes,metadata, ACL, ACL changeset etc.

private static final String GET_ACL_CHANGESETS_URL = "api/solr/aclchangesets";
private static final String GET_ACLS = "api/solr/acls";
private static final String GET_ACLS_READERS = "api/solr/aclsReaders";
private static final String GET_TRANSACTIONS_URL = "api/solr/transactions";
private static final String GET_METADATA_URL = "api/solr/metadata";
private static final String GET_NODES_URL = "api/solr/nodes";
private static final String GET_CONTENT = "api/solr/textContent";
private static final String GET_MODEL = "api/solr/model";
private static final String GET_MODELS_DIFF = "api/solr/modelsdiff";

But i am still wondering for two things which i am still struggling to get a clue.

  1. How and when these webscripts get called? On a certain frequency or in a scheduled job? Also i could see these webscripts getting parameters passed to them such as transaction id , acl changesetid etc. but from where these parameters get passed to these webscripts and finally which component will use the results of these webscripts?
  2. How alfresco passes ACL to solr at time it queries solr for search results and how these ACL get compared to ACL stored in solr indexes.

Thanks a lot for help.

Regards.

maoo commented 10 years ago

I'd advise you to read this blogpost, please let me know if you have any other doubt.

http://www.ixxus.com/blog/2012/06/getting-going-solr-alfresco-4

maoo commented 10 years ago

To answer your questions:

  1. How and when these webscripts get called? On a certain frequency or in a scheduled job? Also i could see these webscripts getting parameters passed to them such as transaction id , acl changesetid etc. but from where these parameters get passed to these webscripts and finally which component will use the results of these webscripts?

SOLRAPIClient.java is executed within the Solr webapp and it gets invoked by a cronjob scheduled - by default - every 15 seconds; the time schedule can be changed editing solrcore.properties in the Alfresco Solr configuration.

Please note that this logic is completely outside of the Alfresco webapp.

  1. How alfresco passes ACL to solr at time it queries solr for search results and how these ACL get compared to ACL stored in solr indexes.

Solr pulls ACL (as any other piece of info) from Alfresco, not the other way around (Alfresco does not push towards Solr) ACLs are grouped into ACL Changesets that are tracked by an homonymous DB table; when Solr asks (via SOLRAPIClient.java) to Alfresco webscripts to get the latest ACL changesets, the webscripts would query the DB table and return a payload; the same approach is used for node transactions.

lalitjangra commented 10 years ago

Thanks a lot Maurizio,

This helped a lot to clear the air.

Solr pulls ACL, ACL changeset etc. from alfresco repository by using a number of webscripts. Can i assume that solr will store these ACL/ACL changeset with documents indexed or these are stored separately somewhere in solr?

Also i am investigating how alfresco/solr compares these ACL stored in solr indexes in real time to return only those search results which need to be presented to user as per his permissions?

Is this comparison happens in Solr at same time search is carried out or once solr has got all indexes, it will then check for ACL to return eligible results or this comparison happens inside alfresco repository?

Finally can you help me with the class/module where this all takes place? Does CorerTracker.java under solr facilitate all this processing?

Regards.

maoo commented 10 years ago

Solr pulls ACL, ACL changeset etc. from alfresco repository by using a number of webscripts. Can i assume that solr will store these ACL/ACL changeset with documents indexed or these are stored separately somewhere in solr?

They are stored separately, currently in the same Solr core (Alfresco Eng is working hard to improve this part)

Also i am investigating how alfresco/solr compares these ACL stored in solr indexes in real time to return only those search results which need to be presented to user as per his permissions?

Is this comparison happens in Solr at same time search is carried out or once solr has got all indexes, it will then check for ACL to return eligible results or this comparison happens inside alfresco repository?

At query time, Alfresco Solr "crosses" document results with acls - depending on the user invoking the query - and gets the "filtered" list of docs to return.

The advantage is that - if you change an ACL on the app:company_home - Solr would have to re-index only one document. The disadvantage is a lack of performance at query time (compared to other approaches)

One of the main reasons that led us to implement Alfresco Webscript Manifold Connector is that we wanted to "explore" the possibility to "skip" the filtering at query time by embedding ACL data (ie readableAuthorities) inside the document payload.

Finally can you help me with the class/module where this all takes place? Does CorerTracker.java under solr facilitate all this processing?

CoreTracker is the main logic that triggers indexing, but the ACL filtering happens somewhere else - https://svn.alfresco.com/repos/alfresco-open-mirror/alfresco/HEAD/root/projects/solr/source/java/org/alfresco/solr/query/AbstractQParser.java

Keep in mind that this approach can be useful for some specific use cases, but not useful for others, ie Alfresco Share cannot search against a search engine populated by this connector.

lalitjangra commented 10 years ago

Thanks a lot Maurizio,

What i want to achieve is that once i store alfresco content into solr, i should be able to store ACL along with as it is a prerequisite.So i want to know how alfresco does this. It seems to be pretty complex but cleaner way.

Till now i understood that content are sent to solr at time of creation. ACL will also be sent with content which will then be maintained using SOLRAPIClient.java/SolrTracker.java. Both content and ACL will be stored into same core but separately.

For returning search results, alfresco will first find matching content, then cross check it with ACL as per user permissions. For a match, it will return the search result otherwise leave it aside and move to next ACL to be matched to content.

This whole operation of matcing content to ACL and returning eligible search results happen after matching content are found. But the place where it happens is still unknown and i need your help here (I checked alfresco solr & solr-client source but i could not figure out where exactly it is happening.).

I was referring to your connector code and found out that ACL are stored with content not separately into Solr (as you highlighted) and at the time of search, the designated user's permissions are matched with ACL stored in solr one by one with each document. So first all matching documents are searched for say desired keyword and then searched documents are checked for ACL to match permissions of users who is searching.

How is it different from Alfresco's strategy for same which first searches all documents for a match and once matched it will cross verify each of document with ACL?

Finally in which scenarios your connector is best to use?

Please suggest.

Regards.

maoo commented 10 years ago

How is it different from Alfresco's strategy for same which first searches all documents for a match and once matched it will cross verify each of document with ACL?

Alfresco Manifold Connector - by embedding ACLs - does not filter out results, it just just query all documents containing a readableAuthority that matches with the id of the user (or any derived authority) passed in the request.

This approach is different from the Alfresco one that I described in my earlier message.

This scenario may fit well if you need to populate a Search Engine in a very custom way (only these properties, include ACL in documents payloads, generate other derived document fields, ...)

Of course this code is still experimental, not deployed on any production environment (AFAIK), unsupported and not guaranteed by Alfresco.

lalitjangra commented 10 years ago

Thanks a lot Maurizio,

Can you also please point out class/module where alfresco matches/cross checks content with ACLs and return back search results as per permissions of user, as we discussed earlier? I am still trying to find out this final piece of puzzle in alfresco source code.

Regards.

maoo commented 10 years ago

I'd start from the solrconfig.xml shipped with Alfresco; the class responsible should be (forgive me, I'm not an Engineer, I may be missing some details) these ones

https://svn.alfresco.com/repos/alfresco-open-mirror/alfresco/HEAD/root/projects/solr/source/java/org/alfresco/solr/query/AbstractQParser.java https://svn.alfresco.com/repos/alfresco-open-mirror/alfresco/HEAD/root/projects/solr/source/java/org/alfresco/solr/AlfrescoUpdateHandler.java

Please keep me posted with your investigations.

lalitjangra commented 10 years ago

Thanks Maurizio, i will keep you posted with my findings.

lalitjangra commented 10 years ago

AbstractQParser.java is where the query is being formed which is being used by AlfrescoLuceneQParserPlugin.java as well as AlfrescoFTSQParserPlugin.java, which are in turn mentioned in solrconfig.xml.

But both AlfrescoLuceneQParserPlugin.java as well as AlfrescoFTSQParserPlugin.java create context aware query and return it back but do not process them.

Example query {"queryConsistency":"DEFAULT","textAttributes":[],"authorities":["GROUP_EVERYONE","GROUP_site_swsdp","GROUP_site_swsdp_SiteCollaborator","ROLE_AUTHENTICATED","abeecher"],"templates":[{"template":"%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT TAG)","name":"keywords"}],"allAttributes":[],"tenants":[""],"query":"((PATH:\"/app:companyhome/st:sites//_//*\" AND (lalit AND (+TYPE:\"cm:content\" +TYPE:\"cm:folder\"))) AND -TYPE:\"cm:thumbnail\" AND -TYPE:\"cm:failedThumbnail\" AND -TYPE:\"cm:rating\") AND NOT ASPECT:\"sys:hidden\"","locales":["en_US"],"defaultNamespace":"http://www.alfresco.org/model/content/1.0","defaultFTSOperator":"OR","defaultFTSFieldOperator":"OR"}

Finally what i observed is that these queries are executed by SolrQueryHTTPClient.java & it is the place where all solr queries are executed in ResultSet executeQuery(SearchParameters searchParameters, String language) method.

Can you confirm this?

Regadrs.