Open ddmbr opened 6 years ago
Seems hdfs-mount cannot recognize the git path correctly, the error msg is as
Error: 2017/12/05 09:40:53 RetryPolicy.go:70: [/https:] Stat: stat /https:: getFileInfo call failed with ERROR_APPLICATION (org.apache.hadoop.fs.InvalidPathException) -> failed attempt #1: will NOT be retried (reached max # of attempts) Warning: 2017/12/05 09:40:53 Dir.go:185: stat [https:]: stat /https:: getFileInfo call failed with ERROR_APPLICATION (org.apache.hadoop.fs.InvalidPathException)stat /https:: getFileInfo call failed with ERROR_APPLICATION (org.apache.hadoop.fs.InvalidPathException)
It works when run the git clone
command outside the mounted directory, e.g. git clone <git path> /mount-point/path-to-file
.
I don't think it's the case you mentioned. I did an strace
and saw something like this,
stat("https://ddmbr@github.com/ddmbr/TestRepo/.git", 0x7fff1cca78f0) = -1 EIO (Input/output error)
stat("https://ddmbr@github.com/ddmbr/TestRepo", 0x7fff1cca78f0) = -1 EIO (Input/output error)
stat("https://ddmbr@github.com/ddmbr/TestRepo.git/.git", 0x7fff1cca78f0) = -1 EIO (Input/output error)
stat("https://ddmbr@github.com/ddmbr/TestRepo.git", 0x7fff1cca78f0) = -1 EIO (Input/output error)
stat("https://ddmbr@github.com/ddmbr/TestRepo.bundle", 0x7fff1cca78f0) = -1 EIO (Input/output error)
stat("https://ddmbr@github.com/ddmbr/TestRepo", 0x7fff1cca78f0) = -1 EIO (Input/output error)
My guess is that git is just testing whether the given URL is actually referring to local resource (e.g., a local path). So it's irrelevant.
Back to the byte missing problem, I found that the number of missing bytes could change. I checked the system calls of the child process (spawned by git) but I didn't quite understand... Following is the log right before the error occurs.
[pid 20704] munmap(0x7fcbe759b000, 10899456) = 0
[pid 20704] munmap(0x7fcbec000000, 56209408) = 0
[pid 20704] mprotect(0x7fcbe8000000, 135168, PROT_READ|PROT_WRITE) = 0
[pid 20704] pread(8, <unfinished ...>
[pid 20703] set_robust_list(0x7fcbf099c9e0, 0x18) = 0
[pid 20703] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7fcbe0000000
[pid 20703] munmap(0x7fcbe4000000, 67108864) = 0
[pid 20703] mprotect(0x7fcbe0000000, 135168, PROT_READ|PROT_WRITE) = 0
[pid 20703] pread(7, <unfinished ...>
[pid 20704] <... pread resumed> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 155, 4331) = 155
[pid 20703] <... pread resumed> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 45, 4853) = 45
[pid 20702] mmap(0x7fcbe4000000, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0 <unfinished ...>
[pid 20703] write(2, "error: inflate: data stream erro"..., 63error: inflate: data stream error (unknown compression method)
) = 63
[pid 20703] write(2, "fatal: serious inflate inconsist"..., 37fatal: serious inflate inconsistency
) = 37
[pid 20702] <... mmap resumed> ) = 0x7fcbe4000000
[pid 20702] mprotect(0x7fcbe4000000, 135168, PROT_READ|PROT_WRITE) = 0
[pid 20702] pread(6, <unfinished ...>
[pid 20703] exit_group(128) = ?
Process 20703 detached
@ddmbr did you manage to fix this problem?
I'm trying out this tool in my cluster, and found something strange when I'm using git. For example, when using "git clone" I saw the following,
"git clone" works well outside the HDFS.
Any idea? It's also helpful if someone can point me to the relevant code. Thanks!