lavv17 / lftp

sophisticated command line file transfer program (ftp, http, sftp, fish, torrent)
http://lftp.yar.ru
GNU General Public License v3.0
1.11k stars 162 forks source link

LFTP `find` takes over 58 minutes to complete while `ls -r` completes in ~37.5 seconds #559

Open ddelabru opened 4 years ago

ddelabru commented 4 years ago

I'm trying to use LFTP (v4.8.4, on Fedora 31) in a script where I need to obtain a listing of all the full file paths on a remote FTP server. The obvious choice for this task is the find LFTP command, and the output of this command does have everything I need in an easy-to-use format, but it takes 58 minutes to complete! In the same environment, the ls -R LFTP command completes in ~37.5 seconds, but it seems the FTP server cannot be coerced into displaying a "short" file listing or otherwise formatting the output in a way that is easier to use. The cls LFTP command does not appear to have a recursive mode.

I'm not sure whether it's relevant, but the FTP server I'm crawling does not seem to support the MLSD command.

Is there a way I can obtain output in the format of find at the kind of speed provided by ls, without parsing the long-form directory listings myself?

lavv17 commented 4 years ago

Lftp cannot parse recusive listings yet. For speed up you can try these settings: set ftp:sync-mode off set ftp:use-stat-for-list on

чт, 2 янв. 2020, 18:57 Dominic Delabruere notifications@github.com:

I'm trying to use LFTP (v4.8.4, on Fedora 31) in a script where I need to obtain a listing of all the full file paths on a remote FTP server. The obvious choice for this task is the find LFTP command, and the output of this command does have everything I need in an easy-to-use format, but it takes 58 minutes to complete! In the same environment, the ls -R LFTP command completes in ~37.5 seconds, but it seems the FTP server cannot be coerced into displaying a "short" file listing or otherwise formatting the output in a way that is easier to use. The cls LFTP command does not appear to have a recursive mode.

I'm not sure whether it's relevant, but the FTP server I'm crawling does not seem to support the MLSD command.

Is there a way I can obtain output in the format of find at the kind of speed provided by ls, without parsing the long-form directory listings myself?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lavv17/lftp/issues/559?email_source=notifications&email_token=AAHLWXFTNAY6M5QMFNTR6NTQ3YFINA5CNFSM4KCDY4V2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IDWP4EA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHLWXBMUV725HQ4AL5Q7BLQ3YFINANCNFSM4KCDY4VQ .

ddelabru commented 4 years ago

Thanks! I ended up parsing the the recursive listing myself after all, since someone shared a helpful regex with me...

After looking at the code for the find command I understand now why this is tricky -- if the FTP server doesn't support recursive listings you have to send repeated CWD commands, and that takes a lot of time -- and on top of that the recursive listings are not guaranteed to follow the same format across different FTP servers. I'd be willing to try to contribute code but I don't think I know a better approach to implement; the best I can think of is trying the kind of parsing approach I'm doing, then falling back to the current behavior if the format is not as expected or if the server does not support recursive listings, but that might be more complexity than is desirable

lavv17 commented 4 years ago

Have you tried these settings? set ftp:sync-mode off set ftp:use-stat-for-list on

ddelabru commented 4 years ago

I just tried this list of commands in an lftp session:

set ftp:sync-mode off
set ftp:use-stat-for-list on
find

(Actually, the find command is still running in the background.) I can tell it's still quite a bit slower than ls -R, but I've only let it run for a few minutes so far so I can't tell you exactly how long it takes to complete.

ddelabru commented 4 years ago

Alright, after timing a full run of that set of commands (with time lftp -e "set ftp:sync-mode off ; set ftp:use-stat-for-list ; find" ftp.redhat.com) I have a figure of ~50 minutes, for the same server described previously.

lavv17 commented 4 years ago

Probably you missed the value for use-stat-for-list

ср, 15 янв. 2020, 23:42 Dominic Delabruere notifications@github.com:

Alright, after timing a full run of that set of commands (with time lftp -e "set ftp:sync-mode off ; set ftp:use-stat-for-list ; find" ftp.redhat.com) I have a figure of ~50 minutes, for the same server described previously.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lavv17/lftp/issues/559?email_source=notifications&email_token=AAHLWXHZQP343EHUM5OX2TDQ55YMZA5CNFSM4KCDY4V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJBXT5Q#issuecomment-574847478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHLWXFHGKYULM3YPOGKIXLQ55YMZANCNFSM4KCDY4VQ .

ddelabru commented 4 years ago

Probably you missed the value for use-stat-for-list

Ah, yes, I included it on my first try, when I was using lftp in interactive mode, but forgot it when I ran time. I am running time lftp -e "set ftp:sync-mode off ; set ftp:use-stat-for-list on ; find" ftp.redhat.com now and will post the results when it completes.

ddelabru commented 4 years ago

I am running time lftp -e "set ftp:sync-mode off ; set ftp:use-stat-for-list on ; find" ftp.redhat.com now and will post the results when it completes.

That did cut the time down significantly from the default behavior of find:

real    19m25.566s
user    2m34.084s
sys 0m6.495s