rbeckman-nextgen / test-mc4

0 stars 0 forks source link

File reader used in conjunction with SFTP server can cause problems #2241

Open rbeckman-nextgen opened 4 years ago

rbeckman-nextgen commented 4 years ago

In this setup, a File Reader connector is being used in conjunction with a SFTP Server to simulate a SFTP Reader connector. The 'Check File Age' is selected and the 'File Filter' is set to *.xml. (Note that the XML is not HL7 XML but a proprietary XML that is transformed via JavaScript. I only mention this because this XML type does not have a size limit since it contains batch data. The transformation isn't even reached when this error occurs and is therefore irrelevant to the problem)

The file age was set at 1000ms and then it was increased to 5000ms. This works most of the time.

If the incoming XML is large enough the file age won't be sufficient and the dreaded 'String out of Index: -2' error will occur indicating that the incoming message was blank (of course the real XML was not actually empty). This happens whenever a file is still uploading when the File Reader connector starts reading data from it.

Recreating this error is fairly simple: 1) Setup a File Reader connector with the 'Check File Age' option marked. 2) Setup a SFTP/FTP server on the machine/virtual machine with the Mirth Connect server 3) Send a large file via a SFTP/FTP client (increase the file size if the error doesn't happen)

Imported Issue. Original Details: Jira Issue Key: MIRTH-2326 Reporter: mbs Created: 2013-01-07T10:38:10.000-0800

rbeckman-nextgen commented 4 years ago

I could be wrong, it seems to me that the 'Check File Age' check is looking at the create date rather than the last modified date.

Imported Comment. Original Details: Author: mbs Created: 2013-03-26T12:39:51.000-0700

rbeckman-nextgen commented 4 years ago

Verified that this is still happening in 3.0. This is a limitation of JSch; SftpATTRS.getMTime() returns the last modified date/time to the second precision. So if you have the File Reader file age set to 1000 ms, it will work most of the time, but occasionally fail because processing overhead pushes the "now - lastMod" difference just slightly at or above 1000.

For example, say the last modified time was xxxxxxxxx0999. When JSch reads this in, it will report a last modified time of xxxxxxxxx0000 because it only has second precision. By the time the processing thread does a System.currentTimeMillis() to get the current time, 1 ms may have already passed, so the current time is xxxxxxxxx1000. So now the difference between the two is 1000 ms, so it passes the file age check and the file gets read in.

Increasing the file age to whatever the goal time is plus 999 should resolve the issue, because that will balance out any effect due to the second precision truncation.

Imported Comment. Original Details: Author: narupley Created: 2013-10-18T16:17:16.000-0700

rbeckman-nextgen commented 4 years ago

Perhaps we could just modify the file age field to use second precision instead of ms precision.

Imported Comment. Original Details: Author: jacobb Created: 2013-10-18T16:23:22.000-0700

rbeckman-nextgen commented 4 years ago

Another note on this:

There will always be some processing time between when JSch reads in the file information (including the last modified date/time) and when the File Reader does a file age check in FileReceiver.processFile(). If you're reading in two files versus one, then already the second file has to wait for the first to completely process through the channel before the second one even starts to be processed. So in this case, there's definitely room for latency and room for this issue to happen, even if you increase the file age.

To help mitigate this, instead of checking the file age in the processFile() method, we can first iterate through all files in the list and check them beforehand (storing some boolean on the FileInfo object). That way channel processing time won't have an effect on the file age check.

Of course, there still will be overhead. If there's latency between the client and server, if you're listing from a directory with a lot of files (e.g. hundreds of thousands), etc., then there will be some unavoidable latency. It's best to do some testing with your specific systems first to see what overhead there is, and factor that into the file age that you choose on the File Reader settings.

Imported Comment. Original Details: Author: narupley Created: 2013-10-18T16:28:34.000-0700