Closed kelson42 closed 6 years ago
I might have a simple basic idea is to create a docker image/container which simply handles the uploads (on the server side).
The technology of file transfer might be discussed, but basically it's a server which allows to upload a file which has to be able to do two things:
This could be quite easy to do that with the FTP protocoll with for example:
The problem with FTP (same with rsync) is the first time worker uploads, a fingerprint verification prompt will appear
@automactic FTP is user/pass based, nothing to do with any crypto proof or fingerprint. You just need the right user/pass.
Mmm... yes you don't need to verify key with FTP.
I don't know if WebDAV could be a good option.
@automactic I don't really care about the protocoll as long as it work. WebDAV might work but it's more risky than FTP and I'm not sure you will get such an easy solution like you might have for FTP with twisted or pyftpdlib. In addition they are for more less WebDAV clients than FPT clients.
Yes I will try to do this with pyftpdlib, looks promising. Since FTP does not encrypt password and everyone can see it. But if we use FTPS I believe the credentials should also be encrypted.
@automactic good... I would recommend to somehow use password with a short life time (maybe only one usage?). We can do that as the dispatcher distribute them and do the check.
It could also be a good idea to have the worker request a file transfer token from dispatcher and send this token to ftp server when uploading the file. Upon receiving the token, the ftp server verify it with dispatcher.
This way, the worker only need to transfer the password once (with dispatcher) during its lifetime. But user also cannot login to the ftp server with other tools (since users cannot see the token)
Due to some reasons I cannot connect to the ftp server once they are in the container, but it works fine outside the container. The error log is below
INFO:pyftpdlib:196.52.2.24:59014-[] FTP session opened (connect)
DEBUG:pyftpdlib:196.52.2.24:59014-[] -> 220 Welcome to Zimfarm Warehouse.
DEBUG:pyftpdlib:196.52.2.24:59014-[] <- USER admin
DEBUG:pyftpdlib:196.52.2.24:59014-[] -> 331 Username ok, send password.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- PASS ******
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 230 Hi, there!
INFO:pyftpdlib:196.52.2.24:59014-[admin] USER 'admin' logged in.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- FEAT
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 211 End FEAT.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- OPTS UTF8 ON
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 501 Invalid argument.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- SYST
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 215 UNIX Type: L8
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- PWD
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 257 "/" is the current directory.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- CWD /
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 250 "/" is the current directory.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- TYPE A
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 200 Type set to: ASCII.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- PASV
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 227 Entering passive mode (172,17,0,3,200,212).
DEBUG:pyftpdlib:[debug] call: close() (<FTPHandler(id=140131051023048, addr='196.52.2.24:59014', user='admin')>)
DEBUG:pyftpdlib:[debug] call: close() (<pyftpdlib.handlers.PassiveDTP listening 172.17.0.3:0 at 0x7f72cd171710>)
INFO:pyftpdlib:196.52.2.24:59014-[admin] FTP session closed (disconnect).
Any ideas?
@automactic I fully support https://github.com/openzim/zimfarm/issues/38#issuecomment-348602315. Regarding the transfer problem, you probably do not support the FTP passiv mode properly. Look a bit how FTP passiv works (you need basically to open other ports).
Here is for example an example fix https://github.com/stilliard/docker-pure-ftpd/pull/9/commits/da4dee5a1412bea0f5a40a4768561297145ecdc5
I took a further look into the issue and read more about passive FTP. It looks like after receiving PASV
, the server is returning 227 Entering passive mode (172,17,0,3,109,116).
The problem is when worker is uploading files, the data transfer happens on 172.17.0.0.3:28020
, which is internal to docker network. We need a way to have the FTP server return the correct address.
I need to set masquerade_address
on the FTP server
@automactic OK, so does it work ?
Yes it does, but when you need to run ftp server you need to set external ip in environment
@ dispatcher/warehouse should be moved to /. Considering that the warehouse does not probably run on the same serveur and that in any case is not depending from the dispatcher, it should not be part of it IMO.
@automactic Could you also merge your code to master so we can close that ticket?
The plain FTP server approach discussed in this issue is already implemented.
As the workers are thought to not run on the same machine as the dispatcher, all generated ZIM files need to be gathered somewhere. That is why we need to be able to upload the files from a worker machine to a dedicated place.
This upload process should match the following requirements: