hjmangalam / parsyncfp

follow-on to parsync (parallel rsync) with better startup perf
Other
164 stars 19 forks source link

fpart hanging? #23

Open fs2307 opened 5 years ago

fs2307 commented 5 years ago

Greetings. I stumbled upon particular filesystem tree that seems to be stopping fpart almost immediately. So far I couldn't get much out of it by increasing verbosity. Do you have any recommendations on what to look for? What can possibly just stop it from seemingly moving forward?

hjmangalam commented 5 years ago

On Wednesday, June 12, 2019 1:41:58 PM PDT fs2307 wrote:

Greetings. I stumbled upon particular filesystem tree that seems to be stopping fpart almost immediately. So far I couldn't get much out of it by increasing verbosity. Do you have any recommendations on what to look for? What can possibly just stop it from seemingly moving forward?

Just about to release a new version that adds a bunch of options and fixes some bugs, so it's a good time to catch other bugs.

When parsyncfp launches with teh default verbosity, it should emit a blue (INFO) line that says something like:

INFO: Forking fpart. Check [/home/hjm/.parsyncfp/fpcache/fpart.log.] for errors if it hangs.

What does that fpart log say?

Also, what do the most recent few rsync logs say (usually in ~/.parsyncfp/) It should be called:

rsync-logfile-<chunk#>

Also, of course, what command did you use? What version were you using?

hjm

Harry Mangalam, Info[1]


[1] http://moo.nac.uci.edu/~hjm/hjm.sig.html

fs2307 commented 5 years ago

Log would usually just say Examining filesystem...

At some point I gave up and just started launching fpart directly on different section of the tree to see if it would move at all.

In some cases it would work fine, but there are parts of tree that it would either stuck at Examining filesystem...

Or do something like Examining filesystem... Filled part #0: size = 764814974, 76 file(s) and don’t move from there (I usually gave up 30 minutes later.) I could see that not more information was put into f. file Just some path and then it’s stopped and no progress seems to be made.

It doesn’t seem to be dead , dead, but it’s very hard to say what it’s thinking about exactly. fpart v0.9.2 parsyncfp version 1.57

hjmangalam commented 5 years ago

Is there something unusual about your filesystem or layout? I haven't heard of a problem with fpart misbehaving; just parsyncfp.. ;)

ie, is there a dir with a bazillion files in it just below the dir it was launched at? Or anywhere in the tree?

also, can you send me the results of this command to generate a listing of ONLY the number and sizes of files you're targeting?

ls -lR | scut -f=4 > target.files

(assuming you've installed scut, and then gzip it for emailing.)

Check the output to make sure you're not leaking proprietary info.

Harry

On Wednesday, June 12, 2019 2:06:53 PM PDT fs2307 wrote:

Log would usually just say Examining filesystem...

At some point I gave up and just started launching fpart directly on different section of the tree to see if it would move at all.

In some cases it would work fine, but there are parts of tree that it would either stuck at Examining filesystem...

Or do something like Examining filesystem... Filled part #0: size = 764814974, 76 file(s) and don’t move from there (I usually gave up 30 minutes later.) I could see that not more information was put into f. file Just some path and then it’s stopped and no progress seems to be made.

It doesn’t seem to be dead , dead, but it’s very hard to say what it’s thinking about exactly. fpart v0.9.2 parsyncfp version 1.57

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/hjmangalam/parsyncfp/issues/23#issuecomment-501455604

Harry Mangalam, Info[1]


[1] http://moo.nac.uci.edu/~hjm/hjm.sig.html

fs2307 commented 5 years ago

Sadly yes. 12tb but about 11 million files and folders. I am surprised it does at top of the tree. Not on deeper part of it. But I guess depend on way you walk.

Get Outlook for Androidhttps://aka.ms/ghei36


From: Harry Mangalam notifications@github.com Sent: Wednesday, June 12, 2019 8:30:31 PM To: hjmangalam/parsyncfp Cc: Syagin, Fedor; Author Subject: Re: [hjmangalam/parsyncfp] fpart hanging? (#23)

Is there something unusual about your filesystem or layout? I haven't heard of a problem with fpart misbehaving; just parsyncfp.. ;)

ie, is there a dir with a bazillion files in it just below the dir it was launched at? Or anywhere in the tree?

also, can you send me the results of this command to generate a listing of ONLY the number and sizes of files you're targeting?

ls -lR | scut -f=4 > target.files

(assuming you've installed scut, and then gzip it for emailing.)

Check the output to make sure you're not leaking proprietary info.

Harry

On Wednesday, June 12, 2019 2:06:53 PM PDT fs2307 wrote:

Log would usually just say Examining filesystem...

At some point I gave up and just started launching fpart directly on different section of the tree to see if it would move at all.

In some cases it would work fine, but there are parts of tree that it would either stuck at Examining filesystem...

Or do something like Examining filesystem... Filled part #0: size = 764814974, 76 file(s) and don’t move from there (I usually gave up 30 minutes later.) I could see that not more information was put into f. file Just some path and then it’s stopped and no progress seems to be made.

It doesn’t seem to be dead , dead, but it’s very hard to say what it’s thinking about exactly. fpart v0.9.2 parsyncfp version 1.57

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/hjmangalam/parsyncfp/issues/23#issuecomment-501455604

Harry Mangalam, Info[1]


[1] http://moo.nac.uci.edu/~hjm/hjm.sig.html

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_hjmangalam_parsyncfp_issues_23-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAMK5RLJNM2IQHZGXHBHDBRDP2GIKNA5CNFSM4HXRS3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXSFNDI-23issuecomment-2D501503629&d=DwMFaQ&c=G2MiLlal7SXE3PeSnG8W6_JBU6FcdVjSsBSbw6gcR0U&r=64BDEtRnXzK9mskFjFzoRwVuIxZ7VIJnvroygZiN6uw&m=8znq3SIjD8gcg9LjCqILyGxftfvNJZi2Y3SNijHr4Z0&s=GWnaWhMTBTGP-T6M5wOIyEFk6QR5a0QDXomYznGDL8Y&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMK5RLLFBDPRJSZWFLFCEDDP2GIKNANCNFSM4HXRS3EQ&d=DwMFaQ&c=G2MiLlal7SXE3PeSnG8W6_JBU6FcdVjSsBSbw6gcR0U&r=64BDEtRnXzK9mskFjFzoRwVuIxZ7VIJnvroygZiN6uw&m=8znq3SIjD8gcg9LjCqILyGxftfvNJZi2Y3SNijHr4Z0&s=J-K3wbmHGD57v2o55J9zpAuR9OFCvD4K4Fof-ST6PEg&e=.

hjmangalam commented 5 years ago

On Wednesday, June 12, 2019 6:20:47 PM PDT fs2307 wrote:

Sadly yes. 12tb but about 11 million files and folders. I am surprised it does at top of the tree. Not on deeper part of it. But I guess depend on way you walk.

Hmm - that really shouldn't block it - regularly back up a 47TB / 8M file tree. It may be the way that it's recursed. What size of chunk file are you using?

The default 10G would be too small for a tree like that. ou want something that will result in <1000 chunkfiles, so for your case, maybe --chunkfile=100G or 200G

hjm

Harry Mangalam, Info[1]


[1] http://moo.nac.uci.edu/~hjm/hjm.sig.html

martymac commented 5 years ago

Hi @fs2307,

Maybe fpart is blocked waiting for some I/O ?

Could you try stracing/trussing the process to see what it's doing ?

Also, you can try using -vv switch with fpart to print filenames when they are added to the current partition.

Hope this helps,

Best regards,

Ganael.

fs2307 commented 5 years ago

Sadly I think I found the issue. Seems to be a folder about 3Tb with 14951317 files in it. One of those wonderful places where you can’t even run ls without it either dying on you or taking hours (if not days) to get out. Black hole of a sort. Running find just to list stuff takes more than a day. Still looking for a good solution on how to deal with that sort of monsters.

Sincerely yours Fedor Syagin Office: (212) 851-4796 Cell: (917) 710-6664

From: Ganael Laplanche notifications@github.com Sent: Thursday, June 13, 2019 6:55 AM To: hjmangalam/parsyncfp parsyncfp@noreply.github.com Cc: Syagin, Fedor fs2307@cumc.columbia.edu; Mention mention@noreply.github.com Subject: Re: [hjmangalam/parsyncfp] fpart hanging? (#23)

Hi @fs2307https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_fs2307&d=DwMCaQ&c=G2MiLlal7SXE3PeSnG8W6_JBU6FcdVjSsBSbw6gcR0U&r=64BDEtRnXzK9mskFjFzoRwVuIxZ7VIJnvroygZiN6uw&m=9ONo2LwKtPIhsuI33LtD0P7Ky6wZCA4kycMk1lkKtkA&s=KZZuS9aX-9Cuht8R7j5NPi9p0rogmO57_j85tjuFD4I&e=,

Maybe fpart is blocked waiting for some I/O ?

Could you try stracing/trussing the process to see what it's doing ?

Also, you can try using -vv switch with fpart to print filenames when they are added to the current partition.

Hope this helps,

Best regards,

Ganael.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_hjmangalam_parsyncfp_issues_23-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAMK5RLLSS6OJUIWJOPNUXVTP2IRRDA5CNFSM4HXRS3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXTKEIA-23issuecomment-2D501654048&d=DwMCaQ&c=G2MiLlal7SXE3PeSnG8W6_JBU6FcdVjSsBSbw6gcR0U&r=64BDEtRnXzK9mskFjFzoRwVuIxZ7VIJnvroygZiN6uw&m=9ONo2LwKtPIhsuI33LtD0P7Ky6wZCA4kycMk1lkKtkA&s=yTrcsEpg8_Wcb0kA4yFEUVaEOpuqu0bjQtAVFO4gDV8&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMK5RLP3RRSFXIORRTQL3H3P2IRRDANCNFSM4HXRS3EQ&d=DwMCaQ&c=G2MiLlal7SXE3PeSnG8W6_JBU6FcdVjSsBSbw6gcR0U&r=64BDEtRnXzK9mskFjFzoRwVuIxZ7VIJnvroygZiN6uw&m=9ONo2LwKtPIhsuI33LtD0P7Ky6wZCA4kycMk1lkKtkA&s=TRCzfn0kSmmSOY9khXGE6FtSAQhndY_5ZqbXxKdDK3Y&e=.