lavv17 / lftp

sophisticated command line file transfer program (ftp, http, sftp, fish, torrent)
http://lftp.yar.ru
GNU General Public License v3.0
1.1k stars 161 forks source link

MLSD ParseLongList returning an empty list #652

Open mruprich opened 2 years ago

mruprich commented 2 years ago

Hi, when using lftp to mirror files from FTP server with MLSD set to yes, for some reason the FileSet from FileSet *Ftp::ParseLongList in src/FtpListInfo.cc returns an empty list even though the list of files from MLSD parser has the complete set of files. The FTP is a Cerberus FTP Server and it is only reproducible with this FTP server and MLSD.

The command I use is this:

$ lftp -d -c 'set ftp:ssl-allow no; open -u xxxx,xxxx ftp://xx.xx.xx.xx/EHI_KBB_Incoming/; set ftp:use-mlsd yes; mirror --verbose=9 -r .'

No files are mirrored from the server. If, on the other hand, I set 'ftp:use-mlsd' to no, all the files are successfully mirrored. First I tried to see what data the server sends via strace but looking at the read() syscall I can see that the list of files is complete so the problem needs to be in the lftp parsing somewhere.

I was not sure how to debug this so I added a couple of prints in the code and it turns out that the mistake has to be in the mentioned function. Specifically the part where you try to return 'the_set' of files. When I use MLSD, the length of the set of files is always 0, even though I can see that the number of files that the MLSD parser parsed is actually 76. I added a printf in this part of code:

leave:
   for(i=0; i<number_of_parsers; i++) {
      printf("-------- Ftp::ParseLongList - cleaning up parsers, set[%d] has length %d\n", i, set[i]->count());
      if(&set[i]!=the_set)
         delete set[i];
   }
   if(err_ret && the_err)
      *err_ret=*the_err;

   if(the_set)
        printf("-------- Ftp::ParseLongList - the_set length: %d\n", (*the_set)->count());

   return the_set?*the_set:0;

And the result is this:

-------- Ftp::ParseLongList - cleaning up parsers, set[0] has length 0
-------- Ftp::ParseLongList - cleaning up parsers, set[1] has length 0
-------- Ftp::ParseLongList - cleaning up parsers, set[2] has length 0
-------- Ftp::ParseLongList - cleaning up parsers, set[3] has length 76
-------- Ftp::ParseLongList - cleaning up parsers, set[4] has length 0
-------- Ftp::ParseLongList - cleaning up parsers, set[5] has length 0
-------- Ftp::ParseLongList - cleaning up parsers, set[6] has length 0
-------- Ftp::ParseLongList - the_set length: 0

So basically the set[3] which is the set of files from MLSD parser is the one that should be selected but it isn't. I blame this piece of code:

if(*best_err1>err[i])
               best_err1=&err[i];
            if(*best_err2>err[i] && best_err1!=&err[i])
               best_err2=&err[i];
            if(*best_err1>16)
               goto leave; // too many errors with best parser.
         }
         if(*best_err2 > (*best_err1+1)*16)
         {
            i=best_err1-err;
            guessed_parser=line_parsers[i];
            the_set=&set[i];
            the_err=&err[i];
         }

I cannot quite figure out what is this logic with best_err1 and best_err2 and what it is supposed to do. MLSD as the best parser(and the one requested by the command) is never selected by this process. The guessed parser is always 0 in the end.

I would appreciate any idea why this is happening. I can reproduce it now locally so I can test any suggestion you might have. Thanks and regards, Michal Ruprich

pbtura commented 2 years ago

Did you ever find a solution to this? I think I am having the same issue ( see: https://github.com/lavv17/lftp/issues/659 ). I found that the root of the problem seems to be in the ParseFtpLongList_UNIX method. When it attempts to parse a return value from an MLSD server, Ftp::ParseLongList expects an error to be registered during the call to ParseFtpLongList_UNIX. However, the string returned by an MLSD server can start with "size=". If that happens, the strchr call on line 255 will see the 's' as the first character and return 0 rather than registering an error.

The result is that best_err1 never gets incremented, so the codethat should be assigning a value to the_set and guessed_parser (lines 123 - 129) never gets called.

mruprich commented 2 years ago

@pbtura Hi, unfortunately I did not get back to this yet. I was hoping that @lavv17 might take a look and see some simple fix here. I might get back to this when I have some time.