Closed jtweed closed 5 years ago
This is appearing in the MCP client logs
Found the issue.
This is from the docker-compose file I based our ECS task definition on:
NAILGUN_SERVER: "fits"
NAILGUN_PORT: "2113"
And this is our ECS task definition:
MAILGUN_SERVER = "${local.fits_service_hostname}"
MAILGUN_PORT = "2113"
🤦‍♀️
This is still failing, but it is getting further than before:
recv: Connection reset by peer
Command FITS failed with exit status 227; stderr:
Just before that is this line:
edu.harvard.hul.ois.fits.exceptions.FitsConfigurationException: /var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/extractPackagesChoice/OfficeDocs-48d49662-7f44-4393-b271-24e7cf0f848e/objects/_datavibe-l__FW__job_vacancy.rtf does not exist or is not readable
so I wonder if I’m not mounting the shared volume in the fits container?
Hmm, no, that is loaded in the ECS task definition. I’ll dig around some more.
It looks as if the environment variables are set up correctly in the MCP client container.
NAILGUN_SERVER=am-fits.archivematica
NAILGUN_PORT=2113
Inside the fits docker container:
/var/archivematica/sharedDirectory
is mounted.Running the ng-nailgun
command manually on the mcp-client container and monitoring the connection attempt with netstat on the other end, it looks like it's able to locate and connect to fits container but the connection gets dropped almost straight away. If I simulate the connection with nc -v
it keeps the connection open. So quite possibly something related to the higher level protocol...? I'm going to briefly step away from this one and see if any bright ideas come up :smile:
@helenst Running your command without the output is possibly a little more informative:
$ ng-nailgun edu.harvard.hul.ois.fits.Fits -i /var/archivematica/sharedDirectory/failed/SAACT_2433-7db2fb67-793e-4bb1-9748-fd4458bbe57d/objects/AM_transcript__13_March_2018.docx
A fair chunk of XML pops out before that error occurs:
<?xml version="1.0" encoding="UTF-8"?>
<fits xmlns="http://hul.harvard.edu/ois/xml/ns/fits/fits_output" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/fits/fits_output http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd" version="0.8.4" timestamp="3/7/19 5:51 PM">
<identification status="CONFLICT">
<identity format="Microsoft Word 2007+" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" toolname="FITS" toolversion="0.8.4">
<tool toolname="file utility" toolversion="5.14" />
</identity>
<identity format="OpenDocument Text" mimetype="application/vnd.oasis.opendocument.text" toolname="FITS" toolversion="0.8.4">
<tool toolname="Exiftool" toolversion="9.13" />
<tool toolname="NLNZ Metadata Extractor" toolversion="3.4GA" />
</identity>
<identity format="Office Open XML Document" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" toolname="FITS" toolversion="0.8.4">
<tool toolname="Tika" toolversion="1.3" />
</identity>
</identification>
<fileinfo>
<lastmodified toolname="Exiftool" toolversion="9.13" status="SINGLE_RESULT">2019:02:01 12:06:48+00:00</lastmodified>
<filepath toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">/var/archivematica/sharedDirectory/failed/SAACT_2433-7db2fb67-793e-4bb1-9748-fd4458bbe57d/objects/AM_transcript__13_March_2018.docx</filepath>
<filename toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">AM_transcript__13_March_2018.docx</filename>
<size toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">166280</size>
<fslastmodified toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">1549022808000</fslastmodified>
</fileinfo>
<filestatus />
<metadata />
<toolOutput>
<tool name="file utility" version="5.14">
<fileUtilityOutput xmlns="">
<rawOutput>Microsoft Word 2007+
application/vnd.openxmlformats-officedocument.wordprocessingml.document; charset=binary</rawOutput>
<mimetype>application/vnd.openxmlformats-officedocument.wordprocessingml.document</mimetype>
<format>Microsoft Word 2007+</format>
</fileUtilityOutput>
</tool>
<tool name="Exiftool" version="9.13">
<exiftool xmlns="">
<rawOutput>ExifToolVersion 9.13
FileName AM_transcript__13_March_2018.docx
Directory /var/archivematica/sharedDirectory/failed/SAACT_2433-7db2fb67-793e-4bb1-9748-fd4458bbe57d/objects
FileSize 162 kB
FileModifyDate 2019:02:01 12:06:48+00:00
FileAccessDate 2019:03:06 14:39:37+00:00
FileInodeChangeDate 2019:03:06 14:39:37+00:00
FilePermissions rw-r--r--
Warning Install Archive::Zip to decode compressed ZIP information
FileType ZIP
MIMEType application/zip
ZipRequiredVersion 20
ZipBitFlag 0x0006
ZipCompression Deflated
ZipModifyDate 1980:01:01 00:00:00
ZipCRC 0x4b278a6d
ZipCompressedSize 358
ZipUncompressedSize 1364
ZipFileName [Content_Types].xml</rawOutput>
<ExifToolVersion>9.13</ExifToolVersion>
<FileName>AM_transcript__13_March_2018.docx</FileName>
<Directory>/var/archivematica/sharedDirectory/failed/SAACT_2433-7db2fb67-793e-4bb1-9748-fd4458bbe57d/objects</Directory>
<FileSize>162 kB</FileSize>
<FileModifyDate>2019:02:01 12:06:48+00:00</FileModifyDate>
<FileAccessDate>2019:03:06 14:39:37+00:00</FileAccessDate>
<FileInodeChangeDate>2019:03:06 14:39:37+00:00</FileInodeChangeDate>
<FilePermissions>rw-r--r--</FilePermissions>
<Warning>Install Archive::Zip to decode compressed ZIP information</Warning>
<FileType>ZIP</FileType>
<MIMEType>application/zip</MIMEType>
<ZipRequiredVersion>20</ZipRequiredVersion>
<ZipBitFlag>0x0006</ZipBitFlag>
<ZipCompression>Deflated</ZipCompression>
<ZipModifyDate>1980:01:01 00:00:00</ZipModifyDate>
<ZipCRC>0x4b278a6d</ZipCRC>
<ZipCompressedSize>358</ZipCompressedSize>
<ZipUncompressedSize>1364</ZipUncompressedSize>
<ZipFileName>[Content_Types].xml</ZipFileName>
</exiftool>
</tool>
<tool name="NLNZ Metadata Extractor" version="3.4GA">
<OPENOFFICE xmlns="">
<METADATA>
<FILENAME>AM_transcript__13_March_2018.docx</FILENAME>
<SEPARATOR>/</SEPARATOR>
<PARENT>/var/archivematica/sharedDirectory/failed/SAACT_2433-7db2fb67-793e-4bb1-9748-fd4458bbe57d/objects</PARENT>
<CANONICALPATH>/var/archivematica/sharedDirectory/failed/SAACT_2433-7db2fb67-793e-4bb1-9748-fd4458bbe57d/objects/AM_transcript__13_March_2018.docx</CANONICALPATH>
<ABSOLUTEPATH>/var/archivematica/sharedDirectory/failed/SAACT_2433-7db2fb67-793e-4bb1-9748-fd4458bbe57d/objects/AM_transcript__13_March_2018.docx</ABSOLUTEPATH>
<PATH>/var/archivematica/sharedDirectory/failed/SAACT_2433-7db2fb67-793e-4bb1-9748-fd4458bbe57d/objects/AM_transcript__13_March_2018.docx</PATH>
<FILE>true</FILE>
<DIRECTORY>false</DIRECTORY>
<FILELENGTH>166280</FILELENGTH>
<HIDDEN>false</HIDDEN>
<ABSOLUTE>true</ABSOLUTE>
<URL>file:/var/archivematica/sharedDirectory/failed/SAACT_2433-7db2fb67-793e-4bb1-9748-fd4458bbe57d/objects/AM_transcript__13_March_2018.docx</URL>
<URI>file:/var/archivematica/sharedDirectory/failed/SAACT_2433-7db2fb67-793e-4bb1-9748-fd4458bbe57d/objects/AM_transcript__13_March_2018.docx</URI>
<READ>true</READ>
<WRITE>true</WRITE>
<EXTENSION>docx</EXTENSION>
<MODIFIED>2019-02-01 12:06:48</MODIFIED>
<DATE>20190201</DATE>
<DATEPATTERN>yyyyMMdd</DATEPATTERN>
<TIME>120648000</TIME>
<TIMEPATTERN>HHmmssSSS</TIMEPATTERN>
<TYPE>application/open-office-1.x</TYPE>
<PID>null</PID>
<OID>null</OID>
<FID>null</FID>
<PROCESSOR>unknown</PROCESSOR>
</METADATA>
</OPENOFFICE>
</tool>
<tool name="OIS File Information" version="0.2">
<fits xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/fits/fits_output http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd">
<fileinfo>
<filepath>/var/archivematica/sharedDirectory/failed/SAACT_2433-7db2fb67-793e-4bb1-9748-fd4458bbe57d/objects/AM_transcript__13_March_2018.docx</filepath>
<filename>AM_transcript__13_March_2018.docx</filename>
<size>166280</size>
<fslastmodified>1549022808000</fslastmodified>
</fileinfo>
</fits>
</tool>
<tool name="ffident" version="0.2">
<ffidentOutput xmlns="">
<shortName>ZIP</shortName>
<longName>ZIP Format</longName>
<group>archive</group>
<mimetypes>
<mimetype>application/zip</mimetype>
</mimetypes>
<fileExtensions>
<extension>zip</extension>
</fileExtensions>
</ffidentOutput>
</tool>
<tool name="Tika" version="1.3">
<metadata xmlns="">
<field name="Content-Type">
recv: Connection reset by peer
Possibly working out what’s meant to go in that tag would tell us something?
It seems to go through a bunch of tools – file utility, Exiftool, NLNZ Metadata Extractor, OIS File Information, ffident, Tika, and then it falls over. What’s Tika? 🤔
Thanks for continuing the investigation. That's a good question...
Trying with a different file, it fails at a different point (during OIS File Info) . So I don't think tika related. Running on the fits machine with fits.sh
is successful. Running same file from mcp-client via nailgun produces the failure. Maybe nailgun related.
With a really small file (echo 'test' > test.txt
), sometimes the command succeeds. In the fits container logs, fits seems to always exit with code zero. So possibly nothing fits-related actually failing. At the moment my suspicions lie with nailgun doing something weird with the connection.
I did find this one a while back which looked related: https://github.com/facebook/nailgun/issues/23
It sounds like fits is closing stdout somewhere, and that causes ng-nailgun to close the connection. Possibly upgrading to a new version of nailgun would help?
I’m going to get this running in a couple of local containers, then play with different versions to see if that makes a difference.
Unfortunately there are no instructions in the facebook/nailgun repo for updating your version of nailgun. I tried downloading the JAR from the releases page, but couldn't get it to start, and there aren't any Debian packages for it.
The problem does seem to happen more often in AWS than locally (I haven't been able to reproduce it locally yet, even with the same containers and file) – probably the increased latency between ECS containers?
Looks like it was indeed an outdated version of nailgun.
There’s a PR here which creates a Docker image with the newest version of the nailgun server, automatically publishes it to ECR, and uses it in Terraform: https://github.com/wellcometrust/archivematica-infra/pull/29
In limited testing, this seems to work correctly, and no longer exhibits the issue!
I think this is fixed, but it was intermittent when I was trying to test it. I’ll leave it open and close if we haven’t seen it after a few days.
I've not seen this happen in any of the newer transfers. OK to close I think.
The characterize and extract metadata step in Archivematica is failing every time it is run.
I think it might be a service discovery issue, the logs say: