macressler / alfresco-bulk-filesystem-import

Automatically exported from code.google.com/p/alfresco-bulk-filesystem-import
GNU Lesser General Public License v3.0
1 stars 0 forks source link

In-place import when contentstore is a symlink results in error "The node's content is missing" #104

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
From Eric Harper (eharper@ziaconsulting.com)

What steps will reproduce the problem?
1. In Linux environment, place source documents and metadata in 
/opt/alfresco/alf_data/contentstore/bulk_import/Docs
2. Run import with Source 
Dir=/opt/alfresco/alf_data/contentstore/bulk_import/Docs, Target Space=/Company 
Home/Sites/Cust0123/documentLibrary/Docs.
3. Import status shows everything running perfectly, In Place, with 2 files 
imported and 2 metadata files.  See attached file "Linux server Docs status.htm"

What is the expected output? What do you see instead?
Browsing the Document Library for the site in Share looks normal, except no 
thumbnail is shown.  
Previewing results in the error: "The preview could not be loaded from the 
server."  The attached file "Alfresco stack trace.log" is the Alfresco log.
Downloading the content results in the error "The node's content is missing".

What version of the product are you using? On what operating system?
OS/Database/Alfresco versions:
Linux Server (failing):
Linux im-test.ziaconsulting.com 2.6.32-305-ec2 #9-Ubuntu SMP Thu Apr 15 
08:05:38 UTC 2010 x86_64 GNU/Linux
MySQL 5.1.41-3ubuntu12.10 (Ubuntu)
Alfresco Enterprise v3.4.6 (518) 

Windows laptop (succeeding):
Windows 7 64 bit
MySQL 5.1.57-community MySQL Community Server (GPL)
Alfresco Enterprise v3.4.5 (r32115) 

Please provide any additional information below.

I queried the Alfresco database alf_content_url table for the documents 
imported:
+------+------------------------------------------------------------------------
-------------------+-------------------+-----------------+--------------+-------
------+
| id   | content_url                                                            
                   | content_url_short | content_url_crc | content_size | 
orphan_time |
+------+------------------------------------------------------------------------
-------------------+-------------------+-----------------+--------------+-------
------+
| 1454 | 
store:///opt/alfresco/alf_data/contentstore/bulk_import/Docs/MSWordDocument.docx
          | ocument.docx      |      2097558406 |        13232 |        NULL |
| 1455 | 
store:///opt/alfresco/alf_data/contentstore/bulk_import/Docs/PDFDocument.pdf    
          | document.pdf      |       712053059 |       110525 |        NULL |
+------+------------------------------------------------------------------------
-------------------+-------------------+-----------------+--------------+-------
------+

Other files uploaded through Share in the Linux environment have the following 
records in alf_content_url:
+------+------------------------------------------------------------------------
-------------------+-------------------+-----------------+--------------+-------
--------+
| id   | content_url                                                            
                   | content_url_short | content_url_crc | content_size | 
orphan_time   |
+------+------------------------------------------------------------------------
-------------------+-------------------+-----------------+--------------+-------
--------+
| 1448 | store://2012/2/6/14/8/2bb6dc76-74d8-445d-901d-56d4dcb01e76.bin         
                   | dcb01e76.bin      |      2430162150 |          358 | 
1328562483849 |
| 1449 | store://2012/2/6/14/8/3aa77954-95d6-4870-abd6-4cd058e8f92e.bin         
                   | 58e8f92e.bin      |      1878600889 |          359 |       
   NULL |
+------+------------------------------------------------------------------------
-------------------+-------------------+-----------------+--------------+-------
--------+

I have successfully run the same files in a Windows development environment, 
with the BFSIT source code included in my Eclipse environment as a project 
within my Alfresco workspace.  See attached file "Windows laptop Docs 
status.htm" for the status output of BFSIT on the Windows env.

The same DB query of the alf_content_url in the local Windows environment shows 
the following:
+-----+----------------------------------------------+-------------------+------
-----------+--------------+-------------+
| id  | content_url                                  | content_url_short | 
content_url_crc | content_size | orphan_time |
+-----+----------------------------------------------+-------------------+------
-----------+--------------+-------------+
| 688 | store://bulk_import\Docs\MSWordDocument.docx | ocument.docx      |      
3700672750 |        13232 |        NULL |
| 689 | store://bulk_import\Docs\PDFDocument.pdf     | document.pdf      |      
3043332646 |       110525 |        NULL |
+-----+----------------------------------------------+-------------------+------
-----------+--------------+-------------+

Other files uploaded through Share in the windows environment have the 
following records in alf_content_url:
+-----+---------------------------------------------------------------+---------
----------+-----------------+--------------+-------------+
| id  | content_url                                                   | 
content_url_short | content_url_crc | content_size | orphan_time |
+-----+---------------------------------------------------------------+---------
----------+-----------------+--------------+-------------+
| 690 | store://2012/2/7/0/7/7dfcb169-a66a-4c31-9b3c-28cb67a6da2e.bin | 
67a6da2e.bin      |      1003097903 |          500 |        NULL |
| 692 | store://2012/2/7/0/9/c9a0ad76-b025-419c-8025-cbcbce5a9ebd.bin | 
ce5a9ebd.bin      |       841119382 |          500 |        NULL |
+-----+---------------------------------------------------------------+---------
----------+-----------------+--------------+-------------+

It seems the content_url is being set incorrectly in the files imported into 
the Linux environment.  
Note in the Linux env, the /opt/alfresco dir is a symbolic link to 
/vol/opt/alfresco.

Original issue reported on code.google.com by ehar...@ziaconsulting.com on 7 Feb 2012 at 7:49

Attachments:

GoogleCodeExporter commented 9 years ago
One small detail omitted:
Linux environment has BFSIT amp installed:
alfresco-bulk-filesystem-import-1.1.amp 

Windows environment has latest code from Mercurial.

Original comment by ehar...@ziaconsulting.com on 7 Feb 2012 at 7:59

GoogleCodeExporter commented 9 years ago
Can you retry using /vol/opt/alfresco everywhere you used /opt/alfresco?  Java 
doesn't handle symlinks very well and I suspect that might be what's causing 
the issue.

Original comment by pmo...@gmail.com on 14 Feb 2012 at 6:57

GoogleCodeExporter commented 9 years ago
Peter, I retried without the sym link (updated dir.root in 
alfresco-global.properties as well) and everything works fine on the Linux 
server.  Interestingly, the Windows environment works equally well with or 
without the sym link.  

Maybe the documentation should be updated to warn users of this?

Original comment by ehar...@ziaconsulting.com on 16 Feb 2012 at 8:42

GoogleCodeExporter commented 9 years ago
It's a new issue, so I wasn't aware of it until now.  That said, I've linked 
this ticket into the troubleshooting page 
(http://code.google.com/p/alfresco-bulk-filesystem-import/wiki/Troubleshooting).

I'll also leave this issue open, as I still consider this behaviour a bug, 
despite the fact that there's a workaround.

Original comment by pmo...@gmail.com on 16 Feb 2012 at 8:56

GoogleCodeExporter commented 9 years ago
PS. It's no surprise it works on Windows, as NTFS doesn't support symlinks, 
only hardlinks (junctions), which are fundamentally different.  I suspect 
hardlinks would also work on Linux, although they're not normally used as they 
have other challenges.

Original comment by pmo...@gmail.com on 16 Feb 2012 at 8:59

GoogleCodeExporter commented 9 years ago

Original comment by pmo...@gmail.com on 16 Feb 2012 at 10:05

GoogleCodeExporter commented 9 years ago
On Mac OSX I was unable to reproduce this issue.  Creating a symlink to the 
source folder in the contentstore results in a streaming import.  The symlink 
is dereferenced to get the "true" path, which is then detected as being outside 
the contentstore, resulting in a streaming import.

I'll need to test this on Linux to see if it's filesystem specific.

Original comment by pmo...@gmail.com on 17 Jan 2013 at 7:44

GoogleCodeExporter commented 9 years ago
As part of fixing issue #114 the dereferencing of the symlink has been removed. 
 This means that symlinks in the content store are detected as being within the 
content store and result in an in-place import.

Original comment by pmo...@gmail.com on 18 Jan 2013 at 6:10

GoogleCodeExporter commented 9 years ago

Original comment by pmo...@gmail.com on 7 Feb 2013 at 7:33