nickmza / BlogComments

Store Comments from the Blog using GitTalk
0 stars 0 forks source link

Integrating Azure Storage and SFTP with Apache Camel | nick dot blog #22

Open nickmza opened 1 year ago

nickmza commented 1 year ago

https://nickmck.net/2023/09/02/camel-sftp-azure/

danhaywood commented 1 week ago

Hmm, we've found this to be unreliable

2024-08-29T16:33:08.422486723Z 2024-08-29 16:33:08.422  WARN 16 --- [.net:22/scanned] o.a.c.c.f.r.SftpConsumer                 : sftp://xxx@exxx.blob.core.windows.net:22/scanned?delay=300000&initialDelay=300000&knownHostsFile=%2Froot%2F.ssh%2Fknown_hosts&move=..%2FuploadOk&moveFailed=..%2FuploadFailed&password=xxxxxx&preMove=uploadInProgress&strictHostKeyChecking=yes cannot begin processing file: RemoteFile[1010113855.pdf] due to: Cannot rename file from: scanned/1010113855.pdf to: scanned/uploadInProgress/1010113855.pdf. Exchange[]. Caused by: [org.apache.camel.component.file.GenericFileOperationFailedException - Cannot rename file from: scanned/1010113855.pdf to: scanned/uploadInProgress/1010113855.pdf]
2024-08-29T16:33:08.422516449Z 
2024-08-29T16:33:08.422520276Z org.apache.camel.component.file.GenericFileOperationFailedException: Cannot rename file from: scanned/1010113855.pdf to: scanned/uploadInProgress/1010113855.pdf
2024-08-29T16:33:08.422523301Z  at org.apache.camel.component.file.remote.SftpOperations.renameFile(SftpOperations.java:553) ~[camel-ftp-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422525706Z  at org.apache.camel.component.file.strategy.GenericFileProcessStrategySupport.renameFile(GenericFileProcessStrategySupport.java:146) ~[camel-file-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422528261Z  at org.apache.camel.component.file.strategy.GenericFileRenameProcessStrategy.begin(GenericFileRenameProcessStrategy.java:47) ~[camel-file-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422530465Z  at org.apache.camel.component.file.GenericFileConsumer.processExchange(GenericFileConsumer.java:387) [camel-file-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422532779Z  at org.apache.camel.component.file.remote.RemoteFileConsumer.processExchange(RemoteFileConsumer.java:153) [camel-ftp-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422535023Z  at org.apache.camel.component.file.GenericFileConsumer.processBatch(GenericFileConsumer.java:244) [camel-file-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422537298Z  at org.apache.camel.component.file.GenericFileConsumer.poll(GenericFileConsumer.java:205) [camel-file-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422539502Z  at org.apache.camel.support.ScheduledPollConsumer.doRun(ScheduledPollConsumer.java:204) [camel-support-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422549570Z  at org.apache.camel.support.ScheduledPollConsumer.run(ScheduledPollConsumer.java:118) [camel-support-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422552245Z  at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
2024-08-29T16:33:08.422554590Z  at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
2024-08-29T16:33:08.422556854Z  at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
2024-08-29T16:33:08.422559118Z  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
2024-08-29T16:33:08.422561472Z  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
2024-08-29T16:33:08.422563857Z  at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
2024-08-29T16:33:08.422566372Z Caused by: com.jcraft.jsch.SftpException: SourcePathNotFound: The source path for a rename operation does not exist. - RequestId:19eb03a6-d01e-0091-0031-fa1172000000 Time:2024-08-29T16:33:08.4124334Z
2024-08-29T16:33:08.422568926Z  at com.jcraft.jsch.ChannelSftp.throwStatusError(ChannelSftp.java:2829) ~[jsch-0.2.16.jar:0.2.16]
2024-08-29T16:33:08.422571110Z  at com.jcraft.jsch.ChannelSftp.rename(ChannelSftp.java:1898) ~[jsch-0.2.16.jar:0.2.16]
2024-08-29T16:33:08.422573314Z  at org.apache.camel.component.file.remote.SftpOperations.renameFile(SftpOperations.java:549) ~[camel-ftp-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422575729Z  ... 14 more
nickmza commented 1 week ago

@danhaywood Hmm, we've found this to be unreliable

2024-08-29T16:33:08.422486723Z 2024-08-29 16:33:08.422  WARN 16 --- [.net:22/scanned] o.a.c.c.f.r.SftpConsumer                 : sftp://xxx@exxx.blob.core.windows.net:22/scanned?delay=300000&initialDelay=300000&knownHostsFile=%2Froot%2F.ssh%2Fknown_hosts&move=..%2FuploadOk&moveFailed=..%2FuploadFailed&password=xxxxxx&preMove=uploadInProgress&strictHostKeyChecking=yes cannot begin processing file: RemoteFile[1010113855.pdf] due to: Cannot rename file from: scanned/1010113855.pdf to: scanned/uploadInProgress/1010113855.pdf. Exchange[]. Caused by: [org.apache.camel.component.file.GenericFileOperationFailedException - Cannot rename file from: scanned/1010113855.pdf to: scanned/uploadInProgress/1010113855.pdf]
2024-08-29T16:33:08.422516449Z 
2024-08-29T16:33:08.422520276Z org.apache.camel.component.file.GenericFileOperationFailedException: Cannot rename file from: scanned/1010113855.pdf to: scanned/uploadInProgress/1010113855.pdf
2024-08-29T16:33:08.422523301Z    at org.apache.camel.component.file.remote.SftpOperations.renameFile(SftpOperations.java:553) ~[camel-ftp-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422525706Z    at org.apache.camel.component.file.strategy.GenericFileProcessStrategySupport.renameFile(GenericFileProcessStrategySupport.java:146) ~[camel-file-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422528261Z    at org.apache.camel.component.file.strategy.GenericFileRenameProcessStrategy.begin(GenericFileRenameProcessStrategy.java:47) ~[camel-file-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422530465Z    at org.apache.camel.component.file.GenericFileConsumer.processExchange(GenericFileConsumer.java:387) [camel-file-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422532779Z    at org.apache.camel.component.file.remote.RemoteFileConsumer.processExchange(RemoteFileConsumer.java:153) [camel-ftp-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422535023Z    at org.apache.camel.component.file.GenericFileConsumer.processBatch(GenericFileConsumer.java:244) [camel-file-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422537298Z    at org.apache.camel.component.file.GenericFileConsumer.poll(GenericFileConsumer.java:205) [camel-file-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422539502Z    at org.apache.camel.support.ScheduledPollConsumer.doRun(ScheduledPollConsumer.java:204) [camel-support-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422549570Z    at org.apache.camel.support.ScheduledPollConsumer.run(ScheduledPollConsumer.java:118) [camel-support-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422552245Z    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
2024-08-29T16:33:08.422554590Z    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
2024-08-29T16:33:08.422556854Z    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
2024-08-29T16:33:08.422559118Z    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
2024-08-29T16:33:08.422561472Z    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
2024-08-29T16:33:08.422563857Z    at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
2024-08-29T16:33:08.422566372Z Caused by: com.jcraft.jsch.SftpException: SourcePathNotFound: The source path for a rename operation does not exist. - RequestId:19eb03a6-d01e-0091-0031-fa1172000000 Time:2024-08-29T16:33:08.4124334Z
2024-08-29T16:33:08.422568926Z    at com.jcraft.jsch.ChannelSftp.throwStatusError(ChannelSftp.java:2829) ~[jsch-0.2.16.jar:0.2.16]
2024-08-29T16:33:08.422571110Z    at com.jcraft.jsch.ChannelSftp.rename(ChannelSftp.java:1898) ~[jsch-0.2.16.jar:0.2.16]
2024-08-29T16:33:08.422573314Z    at org.apache.camel.component.file.remote.SftpOperations.renameFile(SftpOperations.java:549) ~[camel-ftp-4.4.1.jar:4.4.1]
2024-08-29T16:33:08.422575729Z    ... 14 more

We've had some challenges historically due to filesize (processing started before file was completely uploaded), threading and anti-virus locking the file. This was not with Camel though. What happens if you configure a retry in your error handler?

danhaywood commented 1 week ago

I don't think it'd work - we've had cases where the file has actually disappeared from blob storage completely! Our ops guy tells me that he can understand how that might occur with respect to how Azure emulates sftp on top of blob stores. My guess is that there's some eventual consistency going on, and if one sftp rename command follows on before the previous one has fully completed, then it messes something up, a race condition probably.

Luckily for us we only get a new file to process quite slowly; for now, our workaround when polling for new files is to have a healthy delay in between.